Nsight compute fail to profile L20 gpu

274962485 · April 8, 2024, 3:44am

When I use ncu to profile a cuda kernel (gemm) on L20 gpu, it’s failed and report the error:

==PROF== Connected to process 3522403 (/data/workspace/gemm_test)
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “Kernel” in process 3522403
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

my command:

ncu --set full -o gemm_test ./gemm_test 256 128 64 fp16

nsight system version:

NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.2.2.0 (build 33188574) (public-release)

gpu info:

±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA L20 On | 00000000:23:00.0 Off | 0 |
| N/A 28C P8 36W / 350W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA L20 On | 00000000:33:00.0 Off | 0 |
| N/A 29C P8 35W / 350W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 2 NVIDIA L20 On | 00000000:34:00.0 Off | 0 |
| N/A 37C P0 88W / 350W | 14894MiB / 46068MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 3 NVIDIA L20 On | 00000000:43:00.0 Off | 0 |
| N/A 29C P8 37W / 350W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

This binary can run correctly and can be profiled on A10/A100 gpu.
Does nsight compute support L20 gpu?

Robert_Crovella · April 8, 2024, 4:05pm

can you try again with the latest version of nsight compute? It should be version 2024.1.1, currently.

274962485 · April 9, 2024, 7:03am

It still fail after update ncu to version 2024.1.1.
command: /usr/local/NVIDIA-Nsight-Compute-2024.1/ncu --set full -f -o gemm_profile ./gemm_test 64 256 1024 fp16

==PROF== Connected to process 5061 (/data/shuren/gemm_test)
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “ampere_bf16_s16816gemm_bf16_6…” in process 5061
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

veriosn:

/usr/local/NVIDIA-Nsight-Compute-2024.1/ncu --version

NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2024 NVIDIA Corporation
Version 2024.1.1.0 (build 33998838) (public-release)

Robert_Crovella · April 9, 2024, 12:23pm

OK we are getting some additional information now:

Can you try updating to CUDA 12.4 update 1. Make sure you install the driver that comes with that. Also make sure to verify your CUDA install by running some of the suggested sample programs such as vectorAdd.

274962485 · April 10, 2024, 11:37am

I have updated the cuda driver and cuda toolkit. And now my version is:

Unfortunately the nsight compute still fail.
The vectorAdd kernel can run correctly:

./Samples/0_Introduction/vectorAdd/vectorAdd

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

but it can’t be profiled by ncu:

ncu -o vecAdd_profile ./Samples/0_Introduction/vectorAdd/vectorAdd
[Vector addition of 50000 elements]
==PROF== Connected to process 42839 (/data/shuren/code/cuda-samples/Samples/0_Introduction/vectorAdd/vectorAdd)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “vectorAdd” in process 42839
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

274962485 · April 10, 2024, 11:42am

By the way, I run the kernel on device 1. The gpu 0 on the picture is used by other people.
Do you have any feasible way to use ncu profile on L20 gpu? Then I can reproduce your method. Thanks.

Robert_Crovella · April 10, 2024, 1:03pm

suggestions:

Try running with CUDA_VISIBLE_DEVICES="1", like this:

CUDA_VISIBLE_DEVICES=“1” ncu -o vecAdd_profile ./Samples/0_Introduction/vectorAdd/vectorAdd
Make sure the instructions here have been properly applied to your machine

If neither of those suggestions help, I suggest asking for help on the nsight compute forum.

274962485 · April 11, 2024, 3:05am

Thanks, I have asked for help on the nsight compute forum.

Topic		Replies	Views
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1812	March 18, 2024
L20 is supported in nsight compute? Nsight Compute	4	597	May 11, 2024
No kernels were profiled warning/problem Nsight Compute	17	10152	December 28, 2021
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1339	July 27, 2023
Run ncu command in ubuntu 20.04 Nsight Compute	7	5108	August 8, 2022
Ncu does not detect kernels, ==ERROR== The application returned an error code (11) Nsight Compute kernel , profiling	6	1804	December 13, 2023
`ncu` "No kernels profiled" Nsight Compute	6	2214	September 29, 2022
==ERROR== Failed to prepare kernel for profiling (0xc00000fd) but CUDA sample works Nsight Compute kernel , nvbugs	13	2032	November 6, 2021
Windows 10 error with Nsight: ==WARNING== No kernels were profiled Nsight Compute	3	751	February 22, 2023
Nsight Compute Fails To Profile Kernels on WSL Windows11 Nsight Compute	4	677	April 15, 2024

Nsight compute fail to profile L20 gpu

Related topics