Question about tracking CUDA function fork?

I want to know track the CUDA function fork and finish timing of __global__, __device__ functions.
Is it possible on Nsight Systems? if not is it possible on Nsight Compute?

Hi sakaia,

I don’t fully understand your question. Can you rephrase it?


1 Like

Thank you for your commenting.

I am planning to trace CUDA Kernels of CDP (CUDA Dynamic Parallelism). (*)
In my understanding for tracing CDP, global and device CUDA functions fork and finish should be traced.
For this reason, I am searching a tool for tracing both CUDA functions.

Reference *
for example
cuda-samples/Samples/cdpQuadtree at master · NVIDIA/cuda-samples · GitHub

I’m sorry but Nsight Systems can not trace CDP kernels except on Pascal architecture based GPUs.

1 Like

One more thing to ask.

Nsight Compute can trace both CUDA kernels?

For Nsight Compute, showing individual child kernels from CDP launches is not supported. The complete tree is collapsed into into one kernel.

1 Like

Thank you for your description.

The CUDA kernels for CDP can eject event.
I want to monitor the execution.
Or currently it is difficult to monitor it. (since the kernel is collapsed)

Can you explain what you mean by ‘eject event’?

Thank you for your reply.

I am assuming cudaEvent_t or NVTX for logging time information for functions under CDP.

My colleague, who works on ncu, told me that ‘Observing cudaEvents launched from CDP kernels is not something that ncu can do.’

1 Like

Thank you for your investigation
I recognized both methods give different view.
I would be appreciate following information is supplied.

1)Are both methods meaningful to monitor CDP kernels or not?
2)If it is possible, would you explain the difference of these methods?