Description:
Issue Summary:
I am attempting to call the Phi-4-Multimodal-Instruct model via the NVIDIA API (https://build.nvidia.com/microsoft/phi-4-multimodal-instruct
) with both image and audio inputs. However, my requests fail with a “NetworkError when attempting to fetch resource.”
Despite having 900+ credits in my account, I am unable to submit successful API requests.
Steps to Reproduce:
- Encode an image (
image.png
) and an audio file (audio.wav
) in base64. - Construct a request following NVIDIA’s API documentation.
- Send the request to
https://integrate.api.nvidia.com/v1/chat/completions
. - The request fails with a NetworkError.
Code Snippet:
import requests, base64
invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"
stream = True
with open("image.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
with open("audio.wav", "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode()
assert len(image_b64) + len(audio_b64) < 180_000, \
"To upload larger images and/or audios, use the assets API (see docs)"
headers = {
"Authorization": "Bearer <API_KEY>",
"Accept": "text/event-stream" if stream else "application/json"
}
payload = {
"model": "microsoft/phi-4-multimodal-instruct",
"messages": [
{
"role": "user",
"content": f'Answer the spoken query about the image.<img src="data:image/png;base64,{image_b64}" /><audio src="data:audio/wav;base64,{audio_b64}" />'
}
],
"max_tokens": 512,
"temperature": 0.10,
"top_p": 0.70,
"stream": stream
}
response = requests.post(invoke_url, headers=headers, json=payload)
if stream:
for line in response.iter_lines():
if line:
print(line.decode("utf-8"))
else:
print(response.json())
Error Message:
NetworkError when attempting to fetch resource.
Additional Information:
- Credits Available: 900+ (not an issue of running out of credits).
- Text-only requests work, but adding image and audio results in a NetworkError.
- The payload size is under 180,000 bytes, so it should not require the assets API.
- I have verified the API key and endpoint URL are correct.
- I have tested from multiple networks, ruling out local connectivity issues.
Request for Support:
- Is there an issue with the NVIDIA API handling multimodal requests with both image and audio?
- Are there any API limits or restrictions causing this NetworkError?
- Could you provide guidance on troubleshooting or an alternative approach?
Thank you for your assistance!