How to get Nsight Compute timeline of tensor cores and cuda cores?

Hi, I was trying to get tensor cores and cuda cores timeline in SM like below from nsight compute, what can I do?
image

I use ncu command and get a ncu-rep file, but I found no timeline in nsight compute.

Anyone can help to answer the question?

Hi, @hyaloids

I’m sorry that there is no such timeline provided in Nsight Compute.

Well, that’s OK. Is there any way to get tensor cores and cuda cores runtime/memory utilization from nsight compute?
Thanks for your reply.

Hi, @hyaloids

Sorry for the late response.

PmSampling section will give you a timeline view of the utilization of the SM and the tensor pipe in particular on GA100 and newer. It can be collected with --section PmSampling , or as part of the full set

And SM and Tensor core utilization is also part of the default SpeedOfLight section in the basic set, with details on pipelines being available in the GPU Throughput Breakdown tables. For more details, other sets contain the ComputeWorkloadAnalysis section, which details the individual compute pipeline’s utilization.

1 Like

Using the defintion:

  • CUDA core = 1 lane of FP32 instruction pipeline = 1 thread of FFMA/FADD/FMUL
  • Tensor core = 1 lane of the Tensor instruction pipeline. The definition of “core” differs between NVIDIA Tensor core Generations.

The closest match in hardware for utilization counting the number of cycles the FMA pipes (CUDA cores) and the Tensor Pipe (Tensor Cores) are active.

  • CUDA Cores = (SM FMA Light Pipe Throughput + SM FMA Heavy Pipe Throughput) / 2
  • Tensor Cores = SM Tensor Pipe Throughput

The FMA Pipe (CC 7.0 - 8.0) or FMA Heavy Pipe (CC >8.0) can execute non-FP32 instructions including FP16 and IMAD.

Given that “Cores” are not equivalent to CPU cores. Given these are instruction pipelines the request for memory utilization for these two “core” types is unclear.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.