Support for vision models after enterprise 4000 credits are exhausted - onboarding on paid subscription

Hello NVidia folks,
I am using the following Vision instruct models using the enterprise credits (and python code**).

I am not able to figure out how to continue using it when my credit tokens are exhausted.

Some forums suggest using docker/helm charts or using HuggingFace Enterprise subscription.
But it looks too complex to setup. I would like to continue using the API keys and python code** on NVIDIA cloud.

ht tps://forums.developer.nvidia.com/t/nim-api-credits/305703
ht tps://docs.nvidia.com/nim/large-language-models/latest/deploy-helm.html
ht tps://build.nvidia.com/meta/llama-3_1-70b-instruct?snippet_tab=Docker

Could someone please point me in the right direction?

python code**

import requests, base64

invoke_url = "https://ai.api.nvidia.com/v1/gr/meta/llama-3.2-90b-vision-instruct/chat/completions"
stream = True

with open("image.png", "rb") as f:
  image_b64 = base64.b64encode(f.read()).decode()

assert len(image_b64) < 180_000, \
  "To upload larger images, use the assets API (see docs)"
  

headers = {
  "Authorization": "Bearer $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC",
  "Accept": "text/event-stream" if stream else "application/json"
}

payload = {
  "model": 'meta/llama-3.2-90b-vision-instruct',
  "messages": [
    {
      "role": "user",
      "content": f'What is in this image? <img src="data:image/png;base64,{image_b64}" />'
    }
  ],
  "max_tokens": 512,
  "temperature": 1.00,
  "top_p": 1.00,
  "stream": stream
}

response = requests.post(invoke_url, headers=headers, json=payload)

if stream:
    for line in response.iter_lines():
        if line:
            print(line.decode("utf-8"))
else:
    print(response.json())
1 Like