About using ncu to profile the python code, which further called cu kernels

xwentian · May 30, 2024, 6:44am

Hi,

I tried to collect metrics of cu kernels, which were called by the Python code, on a remote machine equipped with eight A800 cards. The command used in the metric collection was “ncu --device 0 --target-processes all --section SpeedOfLight -o paged_attention -k paged_attention_v1_kernel python benchmark_paged_attention.py”.
However the promoted info showed there was no kernel profiled, as shown below.
$ ncu --device 0 --target-processes all --section SpeedOfLight -o paged_attention -k paged_attention_v1_kernel python benchmark_paged_attention.py
Namespace(batch_size=8, block_size=16, dtype=‘half’, head_size=128, kv_cache_dtype=‘auto’, num_kv_heads=8, num_query_heads=64, profile=False, seed=0, seq_len=4096, use_alibi=False, version=‘v2’)
==PROF== Connected to process 30384 (/home/xwentian/.conda/envs/vllm_a800/bin/python3.8)
Warming up…
Kernel running time: 306.206 us
==PROF== Disconnected from process 30384
==WARNING== No kernels were profiled.

This issue was likely related to the configuration made in the ncu command. Hence I wrote for suggestions here.

Thanks.

veraj · May 31, 2024, 10:29am

Hi, @xwentian

Sorry for the issue you met. Can you tell which version of Nsight Compute is used ?

We recently have a latest release. Can you try with that one ?
If you still encounter the issue, can you provide the repro ?
Thanks !

felix_dt · May 31, 2024, 10:42am

You should check what name ncu considers for these kernels by default, and if the -k argument you passed matches this name. You may want to remove the -k option for that purpose, to get the unfiltered list of kernel.

Note that both the printed and matched name variants can be configured using the --kernel-name-base and --print-kernel-base options.

xwentian · June 3, 2024, 1:46am

Hi, veraj

The info of Nsight Compute used was “version 2024.1.0.0 (build 33681293) (public-release)”.

xwentian@g0015:/data/xwentian$ ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2024.1.0.0 (build 33681293) (public-release)

xwentian · June 3, 2024, 1:52am

Hi felix_dt

I tried a test after receiving your recommendations, and just used the same command line to catch all the kernels. The result stilled showed no kernel(s) had been profiled by ncu command.

I listed the info promoted after running the ncu command as follows. Hope the info might be helpful for you to help me to get the info of kernels.

(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ ncu --device 0 --target-processes all --section SpeedOfLight -o paged_attention python benchmark_paged_attention.py --version v1
Namespace(batch_size=8, block_size=16, dtype=‘half’, head_size=128, kv_cache_dtype=‘auto’, num_kv_heads=8, num_query_heads=64, profile=False, seed=0, seq_len=4096, use_alibi=False, version=‘v1’)
==PROF== Connected to process 2026979 (/home/xwentian/.conda/envs/vllm_a800/bin/python3.8)
==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see NVIDIA Development Tools Solutions - | NVIDIA Developer
Warming up…
Kernel running time: 347.515 us
==PROF== Disconnected from process 2026979
==WARNING== No kernels were profiled.

Thanks.

veraj · June 3, 2024, 2:42am

Hi, @xwentian
Thanks for the detailed output. ERR_NVGPUCTRPERM indicates that profile is not permitted in your machine now.
You need to follow NVIDIA Development Tools Solutions - | NVIDIA Developer to grant the permission.
The quickest way is using “sudo” to execute the command directly.

xwentian · June 3, 2024, 3:40am

Hi Veraj

I used the following command lines to add my account in sudo and repeated the ncu command line again gut the promoted info still showed the collection of kernel metrics was failed as before. For convenience, I pasted my commands below.

su root
visudo (in /etc/sudoers added one line under root ALL=(ALL) ALL)
su xwentian
exec bash
cd /home/xwentian
source activate vllm_a800
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ cd /data/xwentian/vllm/benchmarks/kernels
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ sudo apt-get update
Hit:1 https://download.docker.com/linux/ubuntu focal InRelease
Hit:3 Index of /ubuntu focal InRelease
Hit:2 Index of /compute/cuda/repos/ubuntu2004/x86_64 InRelease
Hit:4 Index of /ubuntu focal InRelease
Hit:5 Index of /ubuntu-toolchain-r/test/ubuntu focal InRelease
Hit:6 Index of /ubuntu focal-updates InRelease
Hit:7 Index of /ubuntu focal-security InRelease
Hit:8 Index of /ubuntu focal-backports InRelease
Reading package lists… Done
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ apt-get update
Reading package lists… Done
E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
E: Unable to lock directory /var/lib/apt/lists/
W: Problem unlinking the file /var/cache/apt/pkgcache.bin - RemoveCaches (13: Permission denied)
W: Problem unlinking the file /var/cache/apt/srcpkgcache.bin - RemoveCaches (13: Permission denied)
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ ncu --device 0 --target-processes all --section SpeedOfLight -o paged_attention python benchmark_paged_attention.py --version v1
Namespace(batch_size=8, block_size=16, dtype=‘half’, head_size=128, kv_cache_dtype=‘auto’, num_kv_heads=8, num_query_heads=64, profile=False, seed=0, seq_len=4096, use_alibi=False, version=‘v1’)
==PROF== Connected to process 2315375 (/home/xwentian/.conda/envs/vllm_a800/bin/python3.8)
==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see NVIDIA Development Tools Solutions - | NVIDIA Developer
Warming up…
Kernel running time: 347.802 us
==PROF== Disconnected from process 2315375
==WARNING== No kernels were profiled.
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$

Rgs.,

veraj · June 3, 2024, 4:15am

Apparently, you still didn’t get the full permission to execute the profiler by using this method.

If you need to use your account, then please follow

To allow access for any user, create a file with the .conf extension containing options nvidia NVreg_RestrictProfilingToAdminUsers=0 in /etc/modprobe.d and reboot to make it take effect.

xwentian · June 3, 2024, 5:21am

Hi Veraj

Thanks.

Before getting your feedback to my previous question, I made changes in /etc/sudoers and added the path /usr/local/cuda-12.1/bin in the line of "Defaults secure_path " so that ncu can be used along with sudo.

(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ /home/xwentian/.conda/envs/vllm_a800/bin/python benchmark_paged_attention.py --version v1
Namespace(batch_size=8, block_size=16, dtype=‘half’, head_size=128, kv_cache_dtype=‘auto’, num_kv_heads=8, num_query_heads=64, profile=False, seed=0, seq_len=4096, use_alibi=False, version=‘v1’)
Warming up…
Kernel running time: 657.015 us
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$ sudo ncu --device 0 -o paged_attention /home/xwentian/.conda/envs/vllm_a800/bin/python benchmark_paged_attention.py --version v1
Namespace(batch_size=8, block_size=16, dtype=‘half’, head_size=128, kv_cache_dtype=‘auto’, num_kv_heads=8, num_query_heads=64, profile=False, seed=0, seq_len=4096, use_alibi=False, version=‘v1’)
==PROF== Connected to process 2807896 (/home/xwentian/.conda/envs/vllm_a800/bin/python3.8)

==ERROR== An error was reported by the driver

==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See 2. Kernel Profiling Guide — NsightCompute 12.5 documentation for more details.
==ERROR== Failed to profile “distribution_elementwise_grid…” in process 2807896
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
(vllm_a800) xwentian@g0015:/data/xwentian/vllm/benchmarks/kernels$

Have you ever met the above new problem before?

Rgs.,

veraj · June 3, 2024, 5:28am

The new failure indicates that there may perhaps other app is using the resource.

Can you just reboot and have another try ?

xwentian · June 3, 2024, 5:44am

I have to wait for sometime so that the machine can be reboot safely.

BTW, if the above new question is related to DCGM, then the command dcgmi profile --pause should be used before profiling the cuda kernels with ncu. Thereafter it is just needed to restart by using dcgmi profile --resume. Are such operations appropriate for DCGM and my profiling work on the CUDA kernels without having to reboot the entire machine?

Rgs.,

veraj · June 3, 2024, 6:53am

It supposes to work by stopping dcgm and release related resource.
But I am not familiar with dcgm, you can have a try.

xwentian · June 12, 2024, 8:25am

Hi Veraj,

Thanks again for your feedback and helps.

veraj · June 15, 2024, 12:00am

This topic was automatically closed after 15 hours. New replies are no longer allowed.

Topic		Replies	Views
Ncu does not detect kernels, ==ERROR== The application returned an error code (11) Nsight Compute kernel , profiling	6	1891	December 13, 2023
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1466	July 27, 2023
Run ncu command in ubuntu 20.04 Nsight Compute	7	5446	August 8, 2022
Option to profile only master process Nsight Compute cuda	23	3544	December 1, 2023
Can't Get NCU GUI To Import Properly Nsight Compute	8	1349	October 5, 2020
==ERROR== Failed to prepare kernel for profiling (0xc00000fd) but CUDA sample works Nsight Compute kernel , nvbugs	13	2057	November 6, 2021
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1953	March 18, 2024
No kernels were profiled warning/problem Nsight Compute	17	10482	December 28, 2021
Nsight Compute not detecting kernel launch Nsight Compute profiling	13	3084	May 6, 2021
Nsight compute fail to profile L20 gpu CUDA Programming and Performance	7	669	April 11, 2024

About using ncu to profile the python code, which further called cu kernels

Related topics