Hey @andyq – you’ll need to sign up for an NVIDIA AI Enterprise account to pull NIMs. You can sign up for a free trial by going to NVIDIA NIM | llama3-8b and clicking the “Run Anywhere with NIM” button
I tried to log in with the new personal API Key and it is now able to pull images from NIM.
However, at the end, there is another issue with pulling one of the images that looked like this:
===========================================
== NVIDIA Inference Microservice LLM NIM ==
===========================================
NVIDIA Inference Microservice LLM NIM Version 1.0.0
Model: nim/meta/llama3-8b-instruct
Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This NIM container is governed by the NVIDIA AI Product Agreement here:
https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/.
A copy of this license can be found under /opt/nim/LICENSE.
The use of this model is governed by the AI Foundation Models Community License
here: https://docs.nvidia.com/ai-foundation-models-community-license.pdf.
ADDITIONAL INFORMATION: Meta Llama 3 Community License, Built with Meta Llama 3.
A copy of the Llama 3 license can be found under /opt/nim/MODEL_LICENSE.
2024-07-23 16:10:05,615 [INFO] PyTorch version 2.2.2 available.
2024-07-23 16:10:06,158 [WARNING] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
2024-07-23 16:10:06,158 [INFO] [TRT-LLM] [I] Starting TensorRT-LLM init.
2024-07-23 16:10:06,276 [INFO] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.10.1.dev2024053000
INFO 07-23 16:10:07.164 api_server.py:489] NIM LLM API version 1.0.0
INFO 07-23 16:10:07.165 ngc_profile.py:217] Running NIM without LoRA. Only looking for compatible profiles that do not support LoRA.
INFO 07-23 16:10:07.166 ngc_profile.py:219] Detected 1 compatible profile(s).
INFO 07-23 16:10:07.166 ngc_injector.py:106] Valid profile: 8835c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d (vllm-fp16-tp1) on GPUs [0]
INFO 07-23 16:10:07.166 ngc_injector.py:141] Selected profile: 8835c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d (vllm-fp16-tp1)
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/entrypoints/openai/api_server.py", line 492, in <module>
engine_args, extracted_name = inject_ngc_hub(engine_args)
File "/usr/local/lib/python3.10/dist-packages/vllm_nvext/hub/ngc_injector.py", line 143, in inject_ngc_hub
repo = optimal_config.workspace()
Exception: Error {
context: "initializing ngc repo from repo_id: ngc://nim/meta/llama3-8b-instruct:hf",
source: CommonError(
Error {
context: "fetching file_map",
source: CommonError(
Error {
context: "get bearer token",
source: CommonError(
"Authentication required; however no API key is detected.\nPlease set the env variable NGC_API_KEY with the API key acquired from NGC. See: https://org.ngc.nvidia.com/setup/api-key.",
),
},
),
},
),
}
The API key I generated is a personal key as mentioned in the tutorial.
So should I use an API key instead of a personal key?
@andyq – you should be able to use either one. I think here you need to include the NGC_API_KEY environment variable. So basically, whatever API/personal key you used for the docker login, set that as an environment variable with
export NGC_API_KEY=<my api key>
then, when launching the NIM, make sure to forward that environment variable to the container with -e NGC_API_KEY in your docker run command like so: