Nsight system profilling "GR active and SM active"

Hey i have 2 nsight system profiles as i have here . 1 with MPI only and one with NCCL based communication… Can in NCCL based communication a show of GR active and SM active confirm that it is a communication and computation overlap??


@jyi can you explain how to use the recipe for communication/compute overlap?

We have two recipes available for communication/compute overlap:

  • The nccl_gpu_overlap_trace recipe provides the overlap percentage for each kernel.
  • The nccl_gpu_time_util_map recipe provides the overlap heatmap.

You can run these recipes using the CLI command nsys recipe <recipe name> --input <input> [<args>] . To use this command, you’ll need to install some Python dependencies, which can be automated with the script mentioned in the nsys recipe --help message.

it is showing
“ERROR: Unknown recipe.”

Also how do i do the same thing for NVSHMEMs for capturing in profilling

What version of Nsys are you using? The NCCL recipe is relatively new.

i ran the recipe sucessfuly, But it is showing no data found. Not only in nccl_gpu_overlap_trace but in any nccl recipe it shows no data found

@jyi are there any known issues here?

@manver can you share the report files with us?

sure here is the report file

report1.zip (7.6 MB)

@jyi, please take a look at that.

Your report file doesn’t contain any NCCL data, so the recipe is not able to analyze it.

Since NCCL is instrumented using NVTX annotations, you will need to enable NVTX to get this data. Could you profile your application by adding NVTX to the trace option like this: --trace=cuda,mpi,nvtx?

do i need to add nvtx markers as well in my code or not needed ??

Not needed.

NCCL is instrumented with NVTX already, when you turn on NVTX tracing in Nsys, you will get the places where it is annotated (which wrap most of the more useful operations.