Run this model locally on Ubuntu 24.04.1 LTS with
NVIDIA GeForce RTX 3060 Ti
NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2
Starting with
sudo docker run nvcr.io/nim/meta/llama-3.1-405b-instruct
Leads to:
2024-09-19 10:42:55,094 [INFO] PyTorch version 2.3.1 available.
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 99, in
main()
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 42, in main
inference_env = prepare_environment()
File “/opt/nim/llm/vllm_nvext/entrypoints/args.py”, line 154, in prepare_environment
engine_args, extracted_name = inject_ngc_hub(engine_args)
File “/opt/nim/llm/vllm_nvext/hub/ngc_injector.py”, line 190, in inject_ngc_hub
system = get_hardware_spec()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 285, in get_hardware_spec
gpus = GPUInspect()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 93, in init
GPUInspect._safe_exec(cuda.cuInit(0))
File “cuda/cuda.pyx”, line 15966, in cuda.cuda.cuInit
File “cuda/ccuda.pyx”, line 17, in cuda.ccuda.cuInit
File “cuda/_cuda/ccuda.pyx”, line 2684, in cuda._cuda.ccuda._cuInit
File “cuda/_cuda/ccuda.pyx”, line 490, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so.1
Then I google for a while, find some recommendations, and try another run.
Starting with
sudo docker run --rm --runtime=nvidia --gpus all nvcr.io/nim/meta/llama-3.1-405b-instruct
Leads to:
2024-09-19 10:34:08,645 [INFO] PyTorch version 2.3.1 available.
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 99, in
main()
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 42, in main
inference_env = prepare_environment()
File “/opt/nim/llm/vllm_nvext/entrypoints/args.py”, line 154, in prepare_environment
engine_args, extracted_name = inject_ngc_hub(engine_args)
File “/opt/nim/llm/vllm_nvext/hub/ngc_injector.py”, line 190, in inject_ngc_hub
system = get_hardware_spec()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 285, in get_hardware_spec
gpus = GPUInspect()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 93, in init
GPUInspect._safe_exec(cuda.cuInit(0))
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 101, in _safe_exec
raise RuntimeError(f"Unexpected error: {status.name}")
RuntimeError: Unexpected error: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE
Now I stuck. Please advise how to fix?