NVProf for NCCL program

Name: Analyzing NCCL Usage with NVIDIA Nsight Systems
Uploaded: 2021-05-28T13:45:16Z
Description: Have you tried Nsight Systems? [Analyzing NCCL Usage with NVIDIA Nsight Systems]

Daniel_Wong · May 26, 2021, 6:02pm

Hi, All

When I want to use NVProf for NCCL problem with --metrics all, The profiling results always return me like

==2781== NVPROF is profiling process 2781, command: ./nccl_example 2 16
==2781== Profiling application: ./nccl_example 2 16
==2781== Profiling result:
No events/metrics were profiled.

My simple nccl program

#include <stdio.h>
#include "cuda_runtime.h"
#include "nccl.h"

#define CUDACHECK(cmd) do {                         \
  cudaError_t e = cmd;                              \
  if( e != cudaSuccess ) {                          \
    printf("Failed: Cuda error %s:%d '%s'\n",             \
        __FILE__,__LINE__,cudaGetErrorString(e));   \
    exit(EXIT_FAILURE);                             \
  }                                                 \
} while(0)


#define NCCLCHECK(cmd) do {                         \
  ncclResult_t r = cmd;                             \
  if (r!= ncclSuccess) {                            \
    printf("Failed, NCCL error %s:%d '%s'\n",             \
        __FILE__,__LINE__,ncclGetErrorString(r));   \
    exit(EXIT_FAILURE);                             \
  }                                                 \
} while(0)


int main(int argc, char* argv[])
{
  ncclComm_t comms[4];


  //managing 4 devices
  int nDev = 3;
  int size = 32*1024*1024;
  int devs[4] = {0, 1, 2};

  //allocating and initializing device buffers
  float** sendbuff = (float**)malloc(nDev * sizeof(float*));
  float** recvbuff = (float**)malloc(nDev * sizeof(float*));
  cudaStream_t* s = (cudaStream_t*)malloc(sizeof(cudaStream_t)*nDev);


  for (int i = 0; i < nDev; ++i) {
    CUDACHECK(cudaSetDevice(i));
    CUDACHECK(cudaMalloc(sendbuff + i, size * sizeof(float)));
    CUDACHECK(cudaMalloc(recvbuff + i, size * sizeof(float)));
    CUDACHECK(cudaMemset(sendbuff[i], 1, size * sizeof(float)));
    CUDACHECK(cudaMemset(recvbuff[i], 0, size * sizeof(float)));
    CUDACHECK(cudaStreamCreate(s+i));
  }


  //initializing NCCL
  NCCLCHECK(ncclCommInitAll(comms, nDev, devs));
//   printf("1--- \n");

   //calling NCCL communication API. Group API is required when using
   //multiple devices per thread
  NCCLCHECK(ncclGroupStart());
  for (int i = 0; i < nDev; ++i)
    NCCLCHECK(ncclAllReduce((const void*)sendbuff[i], (void*)recvbuff[i], 
                                size, ncclFloat, ncclSum, comms[i], s[i]));
  NCCLCHECK(ncclGroupEnd());


  //synchronizing on CUDA streams to wait for completion of NCCL operation
  for (int i = 0; i < nDev; ++i) {
    CUDACHECK(cudaSetDevice(i));
    CUDACHECK(cudaStreamSynchronize(s[i]));
  }

//   printf("2--- \n");

  //free device buffers
  for (int i = 0; i < nDev; ++i) {
    CUDACHECK(cudaSetDevice(i));
    CUDACHECK(cudaFree(sendbuff[i]));
    CUDACHECK(cudaFree(recvbuff[i]));
  }


  //finalizing NCCL
  for(int i = 0; i < nDev; ++i)
      ncclCommDestroy(comms[i]);

  printf("Success \n");
  return 0;
}

Because I need to know the detailed metrics of NCCL APIs.
Thanks!

mnicely · May 28, 2021, 1:45pm

Have you tried Nsight Systems?

Robert_Crovella · May 28, 2021, 3:16pm

Topic		Replies	Views
Profiling NCCL Deep Learning (Training & Inference)	0	566	October 22, 2018
Profiling communication for DGX2 CUDA Programming and Performance	5	973	July 8, 2019
Question about profiling nccl kernels with Nsight Compute Nsight Compute	23	5762	December 24, 2025
Nsight system profilling "GR active and SM active" Profiling Linux Targets cuda , kernel	11	635	September 21, 2024
How to get the bytes read/write sum about Memory access between GPUs? Nsight Compute	7	1057	March 20, 2024
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1603	February 13, 2023
How to get nvprof equivalent of nvprof metrics --query-metrics Nsight Compute	5	291	December 11, 2024
GPU-GPU Communication with nvprof Visual Profiler and nvprof	4	1452	June 16, 2020
`ncu` "No kernels profiled" Nsight Compute	6	2556	September 29, 2022
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1752	July 27, 2023

NVProf for NCCL program

Related topics