Title: 401 Unauthorized when calling NVIDIA Integrate API (/v1/chat/completions) from container (API key works for /v1/models but fails for chat)

I’m running a containerized service (FastAPI + LangChain NVIDIA endpoint client) that uses the NVIDIA Integrate API for text generation.

The key is loaded correctly inside the container and is visible via echo $NVIDIA_API_KEY (length ~70 chars).
However, when making a POST request to https://integrate.api.nvidia.com/v1/chat/completions, the API consistently returns: HTTPError: HTTP Error 401: Unauthorized
Authentication failed
Please check or regenerate your API key. This happens for both models:

  • meta/llama-3.1-70b-instruct

  • meta/llama-3.1-8b-instruct What’s interesting:

    • The same key works fine for GET https://integrate.api.nvidia.com/v1/models (HTTP 200 OK).

    • The key has no trailing spaces or newlines.

    • We have verified Authorization: Bearer <key> and correct Content-Type: application/json.

    • Requests are made directly from inside the container with curl or Python urllib.request.

    • The container runs on Ubuntu 22.04 with curl 8.14 and Python 3.9

    • I would highly appreciate insights on it.

  • Cheers!