Cannot profile kernel from CUDA samples

unital · May 24, 2023, 8:39am

Hi, I am trying to profile the matrix multiplication kernel in /cuda-samples/Samples/0_Introduction/matrixMul/

I am currently getting this error

[Matrix Multiply Using CUDA] - Starting...

==PROF== Connected to process 117662 (/cuda-samples/Samples/0_Introduction/matrixMul/matrixMul)

GPU Device 0: "Ampere" with compute capability 8.6

MatrixA(320,320), MatrixB(640,320)

CUDA error at matrixMul.cu:348 code=2(cudaErrorMemoryAllocation) "cudaProfilerStart()"

==PROF== Disconnected from process 117662

==ERROR== The application returned an error code (1).

==WARNING== No kernels were profiled.

==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

Can I ask how to fix this? Thanks!

jmarusarz · May 24, 2023, 8:52pm

Can you run the sample without Nsight Compute? Can you post the CLI of running it without Nsight Compute, then running it with Nsight Compute?

unital · May 25, 2023, 12:34am

Hi, this is the output for running without ncu :

$ ./matrixMul 

[Matrix Multiply Using CUDA] - Starting...

GPU Device 0: "Ampere" with compute capability 8.6

MatrixA(320,320), MatrixB(640,320)

Computing result using CUDA Kernel...

done

Performance= 2213.51 GFlop/s, Time= 0.059 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block

Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

This is the output for running with ncu:

$ sudo /usr/local/cuda-11.4/bin/ncu -o profile ./matrixMul

[Matrix Multiply Using CUDA] - Starting...

==PROF== Connected to process 117381 (/gpu/cuda-samples/Samples/0_Introduction/matrixMul/matrixMul)

GPU Device 0: "Ampere" with compute capability 8.6

MatrixA(320,320), MatrixB(640,320)

CUDA error at matrixMul.cu:348 code=2(cudaErrorMemoryAllocation) "cudaProfilerStart()"

==PROF== Disconnected from process 117381

==ERROR== The application returned an error code (1).

==WARNING== No kernels were profiled.

==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

jmarusarz · May 25, 2023, 8:43pm

Thanks for those details. I notice that you use “sudo” for the ncu part. Sometimes this can change paths/environment variables/etc… Can you verify that “$sudo ./matrixMul” will run okay?

Also, if your system is configured with permissions, you should be able to run ncu without sudo. What happens if you do that? If you have an issue, is it this one and are you able to enable permissions?

Lastly, I see that you’re using the Nsight Compute from CUDA 11.4, which is quite old. You can download and install just the latest Nsight Compute from the product page and use it with your existing CUDA installation. That is something else that is worth trying to see if it fixes the issue.

unital · May 27, 2023, 11:06am

Thanks for the reply.

It looks like sudo ./matrixMul does not work. Can I ask how I can fix this sudo path problem?

$ sudo ./matrixMul

[Matrix Multiply Using CUDA] - Starting...

GPU Device 0: "Ampere" with compute capability 8.6

MatrixA(320,320), MatrixB(640,320)

CUDA error at matrixMul.cu:348 code=2(cudaErrorMemoryAllocation) "cudaProfilerStart()"

I ran into the exact error that you mentioned when running ncu ./matrixMul - that’s why I had to run with sudo. Unfortuntely I am not able to get permission.

$ ncu ./matrixMul

[Matrix Multiply Using CUDA] - Starting...

==PROF== Connected to process 92755 (/cuda-samples/Samples/0_Introduction/matrixMul/matrixMul)

GPU Device 0: "Ampere" with compute capability 8.6

MatrixA(320,320), MatrixB(640,320)

Computing result using CUDA Kernel...

==ERROR== Error: ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM

done

Performance= 747.78 GFlop/s, Time= 0.175 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block

Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

==PROF== Disconnected from process 92755

==WARNING== No kernels were profiled.

==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

Based on this, how should I go fix this? Thanks!

jmarusarz · May 30, 2023, 5:12pm

It’s not easy to say exactly what the issue is with the sudo profile. What I usually do is “sudo -i” to change to a superuser account or login as root and then try to get a CUDA application running correctly. It’s likely that something isn’t installed or configured correctly in the root/sudo environment. Once you figure out what wasn’t configured correctly, you may be able to use that to fix the “sudo ./matrixmul” command and get it to run correctly.

unital · May 31, 2023, 1:30am

Thanks alot for the reply. Running

$ sudo -i 
# /usr/local/cuda-11.4/bin/ncu ./matrixMul

fixed the problem. Thanks!

Topic		Replies	Views
About using ncu to profile the python code, which further called cu kernels Nsight Compute	13	1110	June 15, 2024
Run ncu command in ubuntu 20.04 Nsight Compute	7	5610	August 8, 2022
Question about ncu profiling Nsight Compute	2	580	March 2, 2022
Unable to profile with NCU -- WARNING: No Kernels were profiled Nsight Compute cuda , nsight , deep-learning-profiler , profiling	3	1836	May 15, 2023
Cannot profile CUDA kernel using NC : Run Bottleneck returned an error Nsight Compute	4	536	October 12, 2021
NSight Compute not finding kernels Nsight Compute	24	780	October 24, 2024
No kernel to profile when using nsight compute Nsight Compute cuda	8	1781	August 9, 2023
No kernels were profiled warning/problem Nsight Compute	17	10697	December 28, 2021
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	2033	March 18, 2024
Run Nsight compute command in ubuntu 20.04 Nsight Compute cuda	3	882	August 22, 2022

Cannot profile kernel from CUDA samples

Related topics