I try to use this command to profile my kernels: “ncu --target-processes all -o profile ./program”.
I didn’t see any reports or statistical data for child kernels launched in the parent kernel. The report is only for the parent kernels.
Are there any methods to profile child kernels?
You may want to clarify what you mean by “child kernels” and “parent kernel”. Are you using CUDA Dynamic Parallelism, or are you using CUDA Graphs, or are you referring to device functions called from global functions?
Thanks for your reply!
I mean dynamic parallelism which is written in the title, and the child kernels mean the kernels launched in the parent kernels.
I find that the code of the child can be found on the source page.
Thanks, I missed checking the title. Nsight Compute does not support providing isolated data for child launches. All data and metrics is for the entire tree of parent and child launches. As you found, the source page will include data for the entire tree, too.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.