I’m running a containerized service (FastAPI + LangChain NVIDIA endpoint client) that uses the NVIDIA Integrate API for text generation.
The key is loaded correctly inside the container and is visible via echo $NVIDIA_API_KEY (length ~70 chars).
However, when making a POST request to https://integrate.api.nvidia.com/v1/chat/completions, the API consistently returns: HTTPError: HTTP Error 401: Unauthorized
Authentication failed
Please check or regenerate your API key. This happens for both models:
-
meta/llama-3.1-70b-instruct -
meta/llama-3.1-8b-instructWhat’s interesting:-
The same key works fine for
GET https://integrate.api.nvidia.com/v1/models(HTTP 200 OK). -
The key has no trailing spaces or newlines.
-
We have verified
Authorization: Bearer <key>and correctContent-Type: application/json. -
Requests are made directly from inside the container with
curlor Pythonurllib.request. -
The container runs on Ubuntu 22.04 with
curl 8.14and Python 3.9 -
I would highly appreciate insights on it.
-
-
Cheers!