Hello NVidia folks,
I am using the following Vision instruct models using the enterprise credits (and python code**).
I am not able to figure out how to continue using it when my credit tokens are exhausted.
Some forums suggest using docker/helm charts or using HuggingFace Enterprise subscription.
But it looks too complex to setup. I would like to continue using the API keys and python code** on NVIDIA cloud.
ht tps://forums.developer.nvidia.com/t/nim-api-credits/305703
ht tps://docs.nvidia.com/nim/large-language-models/latest/deploy-helm.html
ht tps://build.nvidia.com/meta/llama-3_1-70b-instruct?snippet_tab=Docker
Could someone please point me in the right direction?
python code**
import requests, base64
invoke_url = "https://ai.api.nvidia.com/v1/gr/meta/llama-3.2-90b-vision-instruct/chat/completions"
stream = True
with open("image.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
assert len(image_b64) < 180_000, \
"To upload larger images, use the assets API (see docs)"
headers = {
"Authorization": "Bearer $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC",
"Accept": "text/event-stream" if stream else "application/json"
}
payload = {
"model": 'meta/llama-3.2-90b-vision-instruct',
"messages": [
{
"role": "user",
"content": f'What is in this image? <img src="data:image/png;base64,{image_b64}" />'
}
],
"max_tokens": 512,
"temperature": 1.00,
"top_p": 1.00,
"stream": stream
}
response = requests.post(invoke_url, headers=headers, json=payload)
if stream:
for line in response.iter_lines():
if line:
print(line.decode("utf-8"))
else:
print(response.json())