Hey i have 2 nsight system profiles as i have here . 1 with MPI only and one with NCCL based communication… Can in NCCL based communication a show of GR active and SM active confirm that it is a communication and computation overlap??
@jyi can you explain how to use the recipe for communication/compute overlap?
We have two recipes available for communication/compute overlap:
- The
nccl_gpu_overlap_trace
recipe provides the overlap percentage for each kernel. - The
nccl_gpu_time_util_map
recipe provides the overlap heatmap.
You can run these recipes using the CLI command nsys recipe <recipe name> --input <input> [<args>]
. To use this command, you’ll need to install some Python dependencies, which can be automated with the script mentioned in the nsys recipe --help
message.
it is showing
“ERROR: Unknown recipe.”
Also how do i do the same thing for NVSHMEMs for capturing in profilling
What version of Nsys are you using? The NCCL recipe is relatively new.
i ran the recipe sucessfuly, But it is showing no data found. Not only in nccl_gpu_overlap_trace
but in any nccl
recipe it shows no data found
sure here is the report file
report1.zip (7.6 MB)
@jyi, please take a look at that.
Your report file doesn’t contain any NCCL data, so the recipe is not able to analyze it.
Since NCCL is instrumented using NVTX annotations, you will need to enable NVTX to get this data. Could you profile your application by adding NVTX to the trace option like this: --trace=cuda,mpi,nvtx
?
do i need to add nvtx markers as well in my code or not needed ??
Not needed.
NCCL is instrumented with NVTX already, when you turn on NVTX tracing in Nsys, you will get the places where it is annotated (which wrap most of the more useful operations.