Getting Started With NVIDIA NIM Tutorial Issues with NGC Registry

Hello Community,

Just hopped on the train of LLM and experimenting with the NVIDIA NIM server.

I am using a VM of Ubuntu 22.04 with one NVIDIA A2 GPU

I am following the tutorial here: https://docs.nvidia.com/nim/large-language-models/latest/introduction.html

I am stuck on the part shown in the picture (or this shortcut: https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#launch-nvidia-nim-for-llms):

where the error shows a 402 Response (payment required):

Blockquote
model info --format_type ascii nim/meta/llama3-8b-instruct:1.0
Client Error: 402 Response: Payment Required - Request Id: 6a2a9403-1985022 Url: https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama3-8b-instruct/versions/1.0

I also tried to setup a payment method in the NVIDIA Cloud Account but it is not helping.

Anyone has experienced the same issue?

Thanks in advance!

Hey @andyq – you’ll need to sign up for an NVIDIA AI Enterprise account to pull NIMs. You can sign up for a free trial by going to NVIDIA NIM | llama3-8b and clicking the “Run Anywhere with NIM” button

1 Like

Hello @neal.vaidya ,

Thanks for your response!

I tried your suggestion and was able to get the two subscriptions (private registry and NVIDIA AI Enterprise Essentials)

However, when I try to pull the docker container, it is giving me a 401 Error Response:

This is the command from https://build.nvidia.com/meta/llama3-8b?snippet_tab=Docker

docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/meta/llama3-8b-instruct:1.0.0

And this is the result:

Unable to find image 'nvcr.io/nim/meta/llama3-8b-instruct:1.0.0' locally
docker: Error response from daemon: unauthorized: <html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>nginx/1.22.1</center>
</body>
</html>.
See 'docker run --help'.

I also tried generating a new personal API key and give it access to NGC Catalog, Private Registry, and Enterprise Models

Is there anything that I am missing?

Did you redo the docker login nvcr.io with the new API key?

1 Like

Hello @neal.vaidya

I tried to log in with the new personal API Key and it is now able to pull images from NIM.

However, at the end, there is another issue with pulling one of the images that looked like this:

===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================

NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama3-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This NIM container is governed by the NVIDIA AI Product Agreement here:
https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/.
A copy of this license can be found under /opt/nim/LICENSE.

The use of this model is governed by the AI Foundation Models Community License
here: https://docs.nvidia.com/ai-foundation-models-community-license.pdf.

ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.
A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.

2024-07-23 16:10:05,615 [INFO] PyTorch version 2.2.2 available.
2024-07-23 16:10:06,158 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
2024-07-23 16:10:06,158 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.
2024-07-23 16:10:06,276 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000
INFO 07-23 16:10:07.164 api_server.py:489] NIM LLM API version 1.0.0
INFO 07-23 16:10:07.165 ngc_profile.py:217] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 07-23 16:10:07.166 ngc_profile.py:219] Detected 1 compatible profile(s).
INFO 07-23 16:10:07.166 ngc_injector.py:106] Valid profile: 8835c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d (vllm-fp16-tp1) on GPUs [0]
INFO 07-23 16:10:07.166 ngc_injector.py:141] Selected profile: 8835c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d (vllm-fp16-tp1)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/entrypoints/openai/api_server.py", line 492, in <module>
    engine_args, extracted_name = inject_ngc_hub(engine_args)
  File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/hub/ngc_injector.py", line 143, in inject_ngc_hub
    repo = optimal_config.workspace()
Exception: Error {
    context: "initializing ngc repo from repo_id: ngc://nim/meta/llama3-8b-instruct:hf",
    source: CommonError(
        Error {
            context: "fetching file_map",
            source: CommonError(
                Error {
                    context: "get bearer token",
                    source: CommonError(
                        "Authentication required; however no API key is detected.\nPlease set the env variable NGC_API_KEY with the API key acquired from NGC. See: https://org.ngc.nvidia.com/setup/api-key.",
                    ),
                },
            ),
        },
    ),
}

The API key I generated is a personal key as mentioned in the tutorial.
So should I use an API key instead of a personal key?

@andyq – you should be able to use either one. I think here you need to include the NGC_API_KEY environment variable. So basically, whatever API/personal key you used for the docker login, set that as an environment variable with

export NGC_API_KEY=<my api key>

then, when launching the NIM, make sure to forward that environment variable to the container with -e NGC_API_KEY in your docker run command like so:

docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.0

Hello @neal.vaidya,

I figure out the issue which lies in the usage of the environment variables with the “docker run” commands:

docker run -it --rm \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY = <your_ai_key> \
    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
    -u $(id -u) \
    -p 8000:8000 \
    nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.0

I referenced this solution https://forums.developer.nvidia.com/t/nim-nim-meta-llama3-8b-instruct-no-api-key-is-detected/298215 that helped to get through.

Thanks for your help along the way!

1 Like