CUDA fail start. Local NIM Containers run failed

Run this model locally on Ubuntu 24.04.1 LTS with
NVIDIA GeForce RTX 3060 Ti
NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2
Starting with
sudo docker run nvcr.io/nim/meta/llama-3.1-405b-instruct
Leads to:

2024-09-19 10:42:55,094 [INFO] PyTorch version 2.3.1 available.
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 99, in
main()
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 42, in main
inference_env = prepare_environment()
File “/opt/nim/llm/vllm_nvext/entrypoints/args.py”, line 154, in prepare_environment
engine_args, extracted_name = inject_ngc_hub(engine_args)
File “/opt/nim/llm/vllm_nvext/hub/ngc_injector.py”, line 190, in inject_ngc_hub
system = get_hardware_spec()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 285, in get_hardware_spec
gpus = GPUInspect()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 93, in init
GPUInspect._safe_exec(cuda.cuInit(0))
File “cuda/cuda.pyx”, line 15966, in cuda.cuda.cuInit
File “cuda/ccuda.pyx”, line 17, in cuda.ccuda.cuInit
File “cuda/_cuda/ccuda.pyx”, line 2684, in cuda._cuda.ccuda._cuInit
File “cuda/_cuda/ccuda.pyx”, line 490, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so.1

Then I google for a while, find some recommendations, and try another run.

Starting with
sudo docker run --rm --runtime=nvidia --gpus all nvcr.io/nim/meta/llama-3.1-405b-instruct
Leads to:

2024-09-19 10:34:08,645 [INFO] PyTorch version 2.3.1 available.
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 99, in
main()
File “/opt/nim/llm/vllm_nvext/entrypoints/launch.py”, line 42, in main
inference_env = prepare_environment()
File “/opt/nim/llm/vllm_nvext/entrypoints/args.py”, line 154, in prepare_environment
engine_args, extracted_name = inject_ngc_hub(engine_args)
File “/opt/nim/llm/vllm_nvext/hub/ngc_injector.py”, line 190, in inject_ngc_hub
system = get_hardware_spec()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 285, in get_hardware_spec
gpus = GPUInspect()
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 93, in init
GPUInspect._safe_exec(cuda.cuInit(0))
File “/opt/nim/llm/vllm_nvext/hub/hardware_inspect.py”, line 101, in _safe_exec
raise RuntimeError(f"Unexpected error: {status.name}")
RuntimeError: Unexpected error: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE

Now I stuck. Please advise how to fix?

Update the GPU driver on the base machine to the latest available for your GPU. And its entirely possible that the llama-3.1-405b-instruct container won’t run on your GPU, although that isn’t the error you have run into, yet.

Thank you Robert,
Just for your Knowledge Base
The 550 Drivers do not works properly on Ubuntu 24.04 with RTX 3060 Ti.
Downgraded to Ubuntu 22.04 and 550 Drivers with CUDA 12.4 became available.
I using another smaller model as you suggest the current one cannot run on my hardware.
The container successfully run and download additional data.
After I got this error
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU
The total free memory is 6+ GB. And I confused why just 224 cannot be allocated?
How to fix this?