Error When Try to compile llama3 checkpoint using trtllm-build

lerrana · May 29, 2024, 10:34pm

Description

I’m trying to compile a llama3 8B-Instruct model with TensorRT-LLM and the following error is occurring when using the trtllm-build command:

RuntimeError: Unexpected error from cudaGetDeviceCount(). 
Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? 
Error 500: named symbol not found

Environment

TensorRT Version: 0.8.0[05/29/2024-22:21:37]
GPU Type: NVidia RTX-4060
Nvidia Driver Version: 555.85
CUDA Version: 12.5
CUDNN Version: 12.1.105
Operating System + Version: Windows 11 Pro (host) - Ubuntu 22.04 (container)
Python Version: 3.10
PyTorch Version: 2.1.2+cu121
Baremetal or Container: nvidia/cuda:12.1.0-devel-ubuntu22.04

Steps To Reproduce

I am following the tutorial as indicated on the website Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server

Complete execution Log:

[TensorRT-LLM] TensorRT-LLM version: 0.8.0[05/29/2024-22:30:32] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set gpt_attention_plugin to bfloat16.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set gemm_plugin to bfloat16.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set lookup_plugin to None.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set lora_plugin to None.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set context_fmha to True.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set paged_kv_cache to True.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set remove_input_padding to True.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set multi_block_mode to False.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set enable_xqa to True.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set tokens_per_block to 128.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[05/29/2024-22:30:32] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[05/29/2024-22:30:32] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_size*max_input_len. 
It may not be optimal to set max_num_tokens=max_batch_size*max_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads.
Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 497, in main
    parallel_build(source, build_config, args.output_dir, workers,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 420, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    torch.cuda.set_device(gpu_id)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 404, in set_device
    torch._C._cuda_setDevice(device)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

AakankshaS · May 30, 2024, 4:48am

Hi @lerrana ,
This error is usually caused by NVIDIA NVML Driver/library version mismatch.

In a terminal run: lsmod | grep nvidia.

Then unload the module dependent on nvidia driver:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm

Finally, unload the nvidia module: sudo rmmod nvidia.
Now when you try lsmod | grep nvidia, you should get nothing in the terminal output.
Now run nvidia-smi to check if you get the desired output.

Thanks

Topic		Replies	Views
ERROR: No matching distribution found for tensorrt_llm==0.9.0 TensorRT llama	0	64	February 5, 2025
ubuntu 16.04, python3.6.6, tensorflow samplecode invoke error. cudaGetDevice() failed. please help me. CUDA Setup and Installation	1	811	August 22, 2018
Ubuntu 20.04 - CUDA 11.1.1: Missing nvidia-uvm Frameworks cuda	4	7969	October 12, 2021
Getting error as Cuda Runtime (invalid argument) Jetson Nano cuda	12	1726	September 25, 2023
CUDA & TensorRT issue, I'd appreciate the help CUDA Setup and Installation tensorrt , cuda , tensorflow , driver	0	1577	March 26, 2023
Tensorflow build bug Deep Learning (Training & Inference) cuda	0	540	October 3, 2020
Cuda installation on WSL2 in Windows 10 - "no CUDA-capable device is detected" CUDA Setup and Installation	0	1174	November 1, 2023
How to run Tensorflow with my Quadro 1200 GPU? CUDA NVCC Compiler tensorflow , ubuntu , python , wsl	1	833	June 15, 2023
CUDA 10.0 - no CUDA-capable device is detected, nvidia-smi does not work. CUDA Setup and Installation	0	2399	April 24, 2019
TensorRT-LLM error msg Models	1	115	November 7, 2024

Error When Try to compile llama3 checkpoint using trtllm-build

Description

Environment

Steps To Reproduce

Complete execution Log:

Related topics