Running NIM llama-3_1-8b-instruct fails in On-Prem deployment

Trying to deploy llama-3_1-8b-instruct NIM container for Inferencing On-Prem.

Am getting below error. It is behind the firewall but I was able to use huggingface_cli to download the model as proxy settings are inplace. I was able to pull the NIM but while running it is trying to download some files and it is failing. Need to help.

INFO 2024-10-09 08:42:35.718 ngc_injector.py:206] Selected profile: 3bb4e8fe78e5037b05dd618cebb1053347325ad6a1e709e0eb18bb8558362ac5 (vllm-bf16-tp1)
INFO 2024-10-09 08:42:35.719 ngc_injector.py:214] Profile metadata: feat_lora: false
INFO 2024-10-09 08:42:35.719 ngc_injector.py:214] Profile metadata: llm_engine: vllm
INFO 2024-10-09 08:42:35.719 ngc_injector.py:214] Profile metadata: precision: bf16
INFO 2024-10-09 08:42:35.719 ngc_injector.py:214] Profile metadata: tp: 1
INFO 2024-10-09 08:42:35.719 ngc_injector.py:245] Preparing model workspace. This step might download additional files to run the model.
[10-09 08:42:49.652 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:117] One or more errors fetching files:
[10-09 08:42:49.652 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url
Repeats***
[10-09 08:42:49.652 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-8b-instruct/hf-8c22764-nim1.2/files)
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/nim/llm/vllm_nvext/entrypoints/openai/api_server.py”, line 654, in
inference_env = prepare_environment()
File “/opt/nim/llm/vllm_nvext/entrypoints/args.py”, line 155, in prepare_environment
engine_args, extracted_name = inject_ngc_hub(engine_args)
File “/opt/nim/llm/vllm_nvext/hub/ngc_injector.py”, line 247, in inject_ngc_hub
cached = repo.get_all()
Exception: error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-8b-instruct/hf-8c22764-nim1.2/files)

Hi @manjunath.janardhan1 – if your system is behind a firewall you will likely need to follow the steps in the documentation for Serving Models from Local Assets.

Thanks I tried but still same issue when tried inside the container! Is their a way to downlaod the cache on another machie which is not behind the firewall and copy it ?

nim@ad54971136af:/$ download-to-cache --profile 3bb4e8fe78e5037b05dd618cebb1053347325ad6a1e709e0eb18bb8558362ac5
INFO 2024-10-09 16:01:27.408 pre_download.py:80] Fetching contents for profile 3bb4e8fe78e5037b05dd618cebb1053347325ad6a1e709e0eb18bb8558362ac5
INFO 2024-10-09 16:01:27.409 pre_download.py:86] {
“feat_lora”: “false”,
“llm_engine”: “vllm”,
“precision”: “bf16”,
“tp”: “1”
}
[10-09 16:03:23.453 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:117] One or more errors fetching files:
[10-09 16:03:23.453 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-8b-instruct/hf-8c22764-nim1.2/files)
[10-09 16:03:23.453 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-8b-instruct/hf-8c22764-nim1.2/files)
Repeats**

(https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-8b-instruct/hf-8c22764-nim1.2/files)
Traceback (most recent call last):
File “/opt/nim/llm/.venv/bin/download-to-cache”, line 6, in
sys.exit(download_to_cache())
File “/opt/nim/llm/vllm_nvext/hub/pre_download.py”, line 90, in download_to_cache
cached_files = repo.get_all()
Exception: error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-8b-instruct/hf-8c22764-nim1.2/files)
nim@ad54971136af:/$