Ncu profiling failed to profile specific kernels

skps23 · January 22, 2025, 1:48am

I am trying to profile a ML workload using

ncu --target-processes all -k regex:"xmma" -o profile_ncu python3 application.py

However, I am getting the following error.

==ERROR== Unknown Error on device 4.
==ERROR== Failed to profile "sm90_xmma_fprop_implicit_gemm..." in process 12601
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
make: *** [Makefile:46: run_harness] Error 9

May I know what is causing this issue?

veraj · January 22, 2025, 2:59am

Hi, @skps23

Please firstly check if “python3 application.py” can run to end successfully.

skps23 · January 22, 2025, 5:37am

Code runs totally fine when ncu is not invoked. Even a simple ncu python3 application.py is causing profiling error.

veraj · January 22, 2025, 5:55am

1.Please try to profile a simple CUDA sample, not python script to see if this can repro.
2. Please provide the exact python version in your machine

skps23 · January 22, 2025, 6:00am

It is exactly the Nvidia’s submission of DLRM workload in MLPerf Inference 4.1. This entire run is from via the docker image provided by Nvidia. The above run is DLRM inference workload in Offline mode : This is the repo

skps23 · January 22, 2025, 6:08am

Regarding 1) I have tried profiling a helloworld program using ncu. Following is the code:

#include <stdio.h>

__global__ void helloWorld() {
    printf("Hello, World from GPU!\n");
}

int main() {
    printf("Hello, World from CPU!\n");
    helloWorld<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

The code is compiled with nvcc helloworld.cu -o helloworld

$ ./helloworld

Hello, World from CPU!
Hello, World from GPU!

However when running with ncu

$ ncu ./helloworld
Hello, World from CPU!
==PROF== Connected to process 21957 (/work/helloworld)
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile "helloWorld()" in process 21957
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

Is above error correlated to my earlier error?

veraj · January 22, 2025, 6:34am

Thanks.
Are you running ncu ./helloworld in docker also ?
Can you tell which driver / GPU / OS and ncu version do you use ?

skps23 · January 22, 2025, 5:15pm

I am running the helloworld program inside docker. The machine is DGX H200x8, Ubuntu 22.04.4 LTS (Jammy Jellyfish)

Inside the docker ncu is Version 2024.1.1.0 (build 33998838) (public-release)

I tried the ncu ./helloworld program outside the docker and it seemed to work fine. The ncu version on the system (outside the docker) is 2024.3.1.0 (build 34702747) (public-release)

skps23 · January 22, 2025, 10:43pm

Thanks for provoking this discussion. I installed 2024.3 version of ncu also with in the docker image. Now I am able to profile the kernels with ncu inside the docker.

The purpose of this task to get to know the input matrix sizes passed to the kernel. I have posted a question here.

Can you help me if we can get input matrix size info with ncu?

felix_dt · January 27, 2025, 7:37am

The interactive profiling activity shows the api parameters for each CUDA function call and kernel launch, but there is no option to capture and export these from the non-interactive activity or command line.

skps23 · January 27, 2025, 8:36pm

Unfortunately, I do not have GUI access and this needs to be done via terminal. I am looking for solutions via terminal.

Topic		Replies	Views
Ncu profile file not created Nsight Compute	5	1193	September 1, 2021
Run ncu command in ubuntu 20.04 Nsight Compute	7	5886	August 8, 2022
Question about ncu profiling Nsight Compute	2	612	March 2, 2022
About using ncu to profile the python code, which further called cu kernels Nsight Compute	13	1266	June 15, 2024
`ncu` "No kernels profiled" Nsight Compute	6	2469	September 29, 2022
Ncu no kernels profiled -- Target process xxx terminated before first instrumented API call Nsight Compute cuda , kernel , python	5	241	March 18, 2025
Kerner launch error with ncu nvc, nvc++ and nvfortran	3	1013	September 3, 2021
Cannot profile CUDA kernel using NC : Run Bottleneck returned an error Nsight Compute	4	576	October 12, 2021
Cannot profile kernel from CUDA samples Nsight Compute	6	565	May 31, 2023
Unable to profile with NCU -- WARNING: No Kernels were profiled Nsight Compute cuda , nsight , deep-learning-profiler , profiling	3	1905	May 15, 2023

Ncu profiling failed to profile specific kernels

Related topics