Not connect to endpoint https://integrate.api.nvidia.com/v1

Hi teams,

I try using NVIDIA NIM Via OpenAI SDK like below

from openai import OpenAI

client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = "$API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC"
)

completion = client.chat.completions.create(
  model="meta/llama-3.3-70b-instruct",
  messages=[{"role":"user","content":"Write a limerick about the wonders of GPU computing."}],
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

But request very longtime and not response? Can you help me answer for that case.

Thank you!

Hi @hovancon1998

The NVIDIA API catalog offers a no-cost trial experience of NVIDIA NIM, and you may experience extended wait times during periods of high load. To ensure consistent performance, we recommend the following options:

  1. Self-host the API on your cloud provider or on-prem. Research and test use is free under the ‘NVIDIA Developer Program’ access. Please note that your organization must have an NVIDIA AI Enterprise license for production use.
  2. Use serverless NIM API on Hugging Face with per-pay-use pricing. The NVIDIA AI Enterprise license is included with this option so you don’t need a separate license.