Question about ncu profiling

Question about ncu profiling

I try to profile the cuda-samples kernels. One can get profile, other can not get profile. Is there any reason? It appeares follows in failed case.

==WARNING== No kernels were profiled.

Success
/opt/nvidia/NVIDIA-Nsight-Compute-2021.2/ncu --target-processes all --set default matrixMul

Failed
/opt/nvidia/NVIDIA-Nsight-Compute-2021.2/ncu --target-processes all --set default bandwidthTest

Thanks for your help.

Assuming you’re running this test cuda-samples/Samples/1_Utilities/bandwidthTest at master · NVIDIA/cuda-samples · GitHub . I don’t think there are any CUDA kernels in that test. It is just a bunch of memory copies to test memory performance. Nsight Compute is mainly for profiling CUDA kernels as they run on the device. For example, this is a kernel in matrixMul:
MatrixMulCUDA<16>
<<<grid, threads, 0, stream>>>(d_C, d_A, d_B, dimsA.x, dimsB.x);

Nsight Systems could help you visualize memory copy performance. Please let me know if this answers your question or if there are any additional details I could provide.

1 Like

Thank for your suggestion. I am try to dig the problem.
It seems ncu is heavy for profiling for my case.(nsys profile can run successfully)

I try to consider getting each CUDA kernel for ncu separately.