Hi, I am trying to profile the matrix multiplication kernel in /cuda-samples/Samples/0_Introduction/matrixMul/
I am currently getting this error
[Matrix Multiply Using CUDA] - Starting...
==PROF== Connected to process 117662 (/cuda-samples/Samples/0_Introduction/matrixMul/matrixMul)
GPU Device 0: "Ampere" with compute capability 8.6
MatrixA(320,320), MatrixB(640,320)
CUDA error at matrixMul.cu:348 code=2(cudaErrorMemoryAllocation) "cudaProfilerStart()"
==PROF== Disconnected from process 117662
==ERROR== The application returned an error code (1).
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.
$ ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 2213.51 GFlop/s, Time= 0.059 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
This is the output for running with ncu:
$ sudo /usr/local/cuda-11.4/bin/ncu -o profile ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==PROF== Connected to process 117381 (/gpu/cuda-samples/Samples/0_Introduction/matrixMul/matrixMul)
GPU Device 0: "Ampere" with compute capability 8.6
MatrixA(320,320), MatrixB(640,320)
CUDA error at matrixMul.cu:348 code=2(cudaErrorMemoryAllocation) "cudaProfilerStart()"
==PROF== Disconnected from process 117381
==ERROR== The application returned an error code (1).
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.
Thanks for those details. I notice that you use “sudo” for the ncu part. Sometimes this can change paths/environment variables/etc… Can you verify that “$sudo ./matrixMul” will run okay?
Also, if your system is configured with permissions, you should be able to run ncu without sudo. What happens if you do that? If you have an issue, is it this one and are you able to enable permissions?
Lastly, I see that you’re using the Nsight Compute from CUDA 11.4, which is quite old. You can download and install just the latest Nsight Compute from the product page and use it with your existing CUDA installation. That is something else that is worth trying to see if it fixes the issue.
It looks like sudo ./matrixMul does not work. Can I ask how I can fix this sudo path problem?
$ sudo ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6
MatrixA(320,320), MatrixB(640,320)
CUDA error at matrixMul.cu:348 code=2(cudaErrorMemoryAllocation) "cudaProfilerStart()"
I ran into the exact error that you mentioned when running ncu ./matrixMul - that’s why I had to run with sudo. Unfortuntely I am not able to get permission.
$ ncu ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==PROF== Connected to process 92755 (/cuda-samples/Samples/0_Introduction/matrixMul/matrixMul)
GPU Device 0: "Ampere" with compute capability 8.6
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
==ERROR== Error: ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM
done
Performance= 747.78 GFlop/s, Time= 0.175 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 92755
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.