Nsys Profile VLLM Error

gnurse · May 14, 2024, 3:20pm

We have deployed llama3-70b using VLLM on two H100 cards (TP=2, Tensor Parallelism), and I would like to profile its execution process with nsys-2024.3.1.75-243134195302v0.

I did not add any other nsys configuration parameters besides ‘-o’, and the report was generated normally after the program finished running.

In the final results, we only captured a very small number of CUDA GPU Kernel as followed:

This is unreasonable because during the test, 10 queries were sent to the server, all of which were executed correctly, and we observed GPU usage through nv-smi.

We observed these potentially related warning.

Thread count limit is exceeded, not all threads will be shown (thread count: 3762, thread limit: 2000).

CUDA profiling might have not been started correctly.
No CUDA events collected. Does the process use CUDA?

(The second warning appears around 100 times.)

Here is a related vLLM github issue: Error when using nsys profile · Issue #3247 · vllm-project/vllm · GitHub

How should I use/config nsys to obtain the correct results?

hwilper · May 15, 2024, 7:26pm

@liuyis can you help with this.

liuyis · May 15, 2024, 7:34pm

Hi @gnurse , thanks for reaching out. I have a few questions:

Is it possible to share your profiling report?
Is there a way to confirm that your application does submit more CUDA kernels? For example is it possible to print a log everytime a CUDA kernel is invoked? This can help confirming whether Nsys does miss any kernel, and knowing the specific kernels that are missing can also help debugging the actual issue (if any).
Can you also try to turn on GPU metrics sampling feature? You can start with the option --gpu-metrics-device=all. That’s also a way to help confirm GPU usage and whether Nsys is missing CUDA kernels.
Finally, is it possible for us to set up the application on our side to test and debug, or getting access to a system that can run the application?

Thanks,
Liuyi

Topic		Replies	Views
Nsys Profiling error for distributed training of LLaMA 2 7B Profiling Linux Targets cudnn	1	95	July 20, 2024
Nsys does not show CUDA kernels Profiling Linux Targets	6	1303	December 12, 2022
Error Collecting Nsys Profile Metrics Profiling Linux Targets nsight	3	676	April 18, 2024
Missing CUDA runtime events from nsys report Profiling Linux Targets llama-31-70b-instruct , llama	7	73	April 17, 2025
Nsys cannot capture cuda information Profiling DRIVE Targets	9	86	April 21, 2025
Nsight System fails to record CUDA kernels on WSL2 Profiling Linux Targets cuda , kernel , wsl	2	137	January 21, 2025
Where to find cpu/gpu pagefaults when using nsys? Profiling Linux Targets	10	85	May 7, 2025
Error when generating nsys-rep Profiling Linux Targets cuda , kernel , nsight	4	955	May 3, 2023
"Missing Data" Issues in Nsight Systems Profiling Profiling Linux Targets	5	277	July 17, 2024
Nsight-system failed to start profiling Profiling x86 Windows Targets	9	2455	October 12, 2021

Nsys Profile VLLM Error

Related topics