NVIDIA NSight Compute: The profiler returned an error code:1

mbxust0901 · January 7, 2024, 5:32am

GPU: NVIDIA GeForce RTX 4060 Laptop GPU

CUDA Version: 11.8
Nsight compute version：Version 2023.3.1.0 (build 33474944) (public-release)

==PROF== Connected to process 29004 (E:\Workspace\learning\cuda\CudaRuntime\x64\Debug\CudaRuntime.exe)
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “addKernel” in process 29004
==PROF== Trying to shutdown target application
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

veraj · January 12, 2024, 7:55am

Hi, @mbxust0901

Sorry for the issue you met.
Is it reproduced to any cuda sample or just this specific sample ?

mahmoud.sedahmed · January 12, 2024, 12:27pm

Hi Veraj,

I am facing the same issue; I tried profiling any CUDA samples but still have the same problem. This occurs with the “full” metrics sets, but other sets, such as “detailed” and “basic”, work fine.

mbxust0901 · January 13, 2024, 5:58pm

It can be reproduced to any cuda sample.

veraj · January 15, 2024, 6:20am

Hi, @mbxust0901

We can’t reproduce your issue internally with 2023.3.1.0 version + driver 546.33. Does the sample run successfully without NCU ?

mbxust0901 · January 15, 2024, 11:40am

The sample runs successfully without NCU. Is it related to the CUDA Tools version or my GPU “NVIDIA GeForce RTX 4060 Laptop GPU”?

veraj · January 16, 2024, 2:54am

It seems no issue with your GPU and tools version. Have your enabled performance access in control panel ?

Can you try run cuda sdk sample like vectorAdd/matrixMul ? And then do ncu $sample directly ?

mbxust0901 · January 16, 2024, 7:57am

I’ve enabled performance access.

Without NCU:

[Matrix Multiply Using CUDA] - Starting…
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 8.9 is undefined. Default to use Hopper
GPU Device 0: “Hopper” with compute capability 8.9

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 61.14 GFlop/s, Time= 2.144 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

With NCU:

[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 40340 (E:\Workspace\github\cuda-samples\bin\win64\Debug\matrixMul.exe)
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 8.9 is undefined. Default to use Hopper
GPU Device 0: “Hopper” with compute capability 8.9

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “MatrixMulCUDA” in process 40340
==PROF== Trying to shutdown target application
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

veraj · January 16, 2024, 8:55am

Can you please get CUDA12.3 sample to check? Below output seems not correct as you are executing on Ada actually.

MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
MapSMtoArchName for SM 8.9 is undefined. Default to use Hopper
GPU Device 0: “Hopper” with compute capability 8.9

mbxust0901 · January 17, 2024, 7:23am

When using CUDA 12.3 samples, got outputs below:

[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Ada” with compute capability 8.9

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 1044.62 GFlop/s, Time= 0.125 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

With NCU:
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 7400 (E:\Workspace\github\cuda-samples-12.3\Samples\0_Introduction\matrixMul\matrixMul.exe)
GPU Device 0: “Ada” with compute capability 8.9

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “MatrixMulCUDA” in process 7400
==PROF== Trying to shutdown target application
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

veraj · January 18, 2024, 3:11am

Thanks for the update.

So now you are using
12.3 sample + 2023.3.1.0 (build 33474944) + 546.33 Driver + NVIDIA GeForce RTX 4060 Laptop GPU, and any sample will cause “==ERROR== Unknown Error on device 0”.

Can you help us to do some isolation ?
Like using ncu --section ${section_name} or ncu --metrics ${metrics_name} to check if any section or metrics can work.

Also you can check in NCU-UI, use “Interactive Profile=>Run to Next Kernel=>Profile Kernel” to see if any other different error printed.

I will further check with our engineer team to see if anything else we can do. Thanks !

mbxust0901 · January 18, 2024, 3:42am

Yes.

Using ncu --metrics also cannot work.

~ ncu --metrics l1tex__t_bytes_pipe_lsu_mem_global_op_ld.sum.per_second,l1tex__t_bytes_pipe_lsu_mem_global_op_st.sum.per_second E:\Workspace\github\cuda-samples-12.3\Samples\0_Introduction\matrixMul\matrixMul.exe
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 33848 (E:\Workspace\github\cuda-samples-12.3\Samples\0_Introduction\matrixMul\matrixMul.exe)
GPU Device 0: “Ada” with compute capability 8.9

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
==ERROR== Failed to prepare kernel for profiling

==ERROR== Unknown Error on device 0.
==ERROR== Failed to profile “MatrixMulCUDA” in process 33848
==PROF== Trying to shutdown target application
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.

And with NCU-UI:

When using ncu Version 2022.3.0.0 (build 31729285):

[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 43752 (E:\Workspace\github\cuda-samples-12.3\Samples\0_Introduction\matrixMul\matrixMul.exe)
GPU Device 0: “Ada” with compute capability 8.9

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
==ERROR== Profiling is not supported on device 0. To find out supported GPUs refer --list-chips option.
done
Performance= 1028.99 GFlop/s, Time= 0.127 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 43752
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

Thank you. Your assistance is greatly appreciated!

veraj · January 22, 2024, 7:08am

Hi, @mbxust0901

Our dev also prepared an exactly same test config as you. But he also can’t reproduce your issue.

veraj · March 18, 2024, 12:00am

This topic was automatically closed after 12 days. New replies are no longer allowed.

Topic		Replies	Views
Cannot profile CUDA kernel using NC : Run Bottleneck returned an error Nsight Compute	4	536	October 12, 2021
nsight compute ui and cli can't profiling any cuda application Nsight Compute	6	3841	August 21, 2019
The profiler returned an error code:1 Nsight Compute	1	2047	March 2, 2022
Nsight compute exits with error code 3221226505 Nsight Compute	7	759	July 13, 2025
Nsight compute fail to profile L20 gpu CUDA Programming and Performance	7	705	April 11, 2024
Cannot profile kernel from CUDA samples Nsight Compute	6	507	May 31, 2023
==ERROR== Failed to prepare kernel for profiling (0xc00000fd) but CUDA sample works Nsight Compute kernel , nvbugs	13	2090	November 6, 2021
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1491	February 13, 2023
The profiler returned an error code: 3221226505 (0xc0000409) Nsight Compute	4	176	March 26, 2025
No kernels were profiled warning/problem Nsight Compute	17	10697	December 28, 2021

NVIDIA NSight Compute: The profiler returned an error code:1

Related topics