I created an API key from https://build.nvidia.com/ and successfully integrated it with Open WebUI.
Using this setup, I am able to access and get responses from some models, for example:
-
GPT-OSS 120B
-
Meta LLaMA models
However, I am unable to get any output from certain other models, such as:
- IBM Granite (and a few others)
Additionally, I do not see any API usage dashboard in my NVIDIA account.
The only information shown is:
Your API Rate Limit: Up to 40 RPM
There is no visibility into:
-
Request counts
-
Token usage
-
Per-model usage or errors