How to get Nsight Compute timeline of tensor cores and cuda cores?

hyaloids · March 7, 2024, 7:42am

Hi, I was trying to get tensor cores and cuda cores timeline in SM like below from nsight compute, what can I do?

I use ncu command and get a ncu-rep file, but I found no timeline in nsight compute.

Anyone can help to answer the question?

veraj · March 7, 2024, 9:57am

I’m sorry that there is no such timeline provided in Nsight Compute.

hyaloids · March 7, 2024, 12:07pm

Well, that’s OK. Is there any way to get tensor cores and cuda cores runtime/memory utilization from nsight compute?
Thanks for your reply.

veraj · March 12, 2024, 2:40am

Hi, @hyaloids

Sorry for the late response.

PmSampling section will give you a timeline view of the utilization of the SM and the tensor pipe in particular on GA100 and newer. It can be collected with --section PmSampling , or as part of the full set

And SM and Tensor core utilization is also part of the default SpeedOfLight section in the basic set, with details on pipelines being available in the GPU Throughput Breakdown tables. For more details, other sets contain the ComputeWorkloadAnalysis section, which details the individual compute pipeline’s utilization.

Greg · March 12, 2024, 7:38pm

Using the defintion:

CUDA core = 1 lane of FP32 instruction pipeline = 1 thread of FFMA/FADD/FMUL
Tensor core = 1 lane of the Tensor instruction pipeline. The definition of “core” differs between NVIDIA Tensor core Generations.

The closest match in hardware for utilization counting the number of cycles the FMA pipes (CUDA cores) and the Tensor Pipe (Tensor Cores) are active.

CUDA Cores = (SM FMA Light Pipe Throughput + SM FMA Heavy Pipe Throughput) / 2
Tensor Cores = SM Tensor Pipe Throughput

The FMA Pipe (CC 7.0 - 8.0) or FMA Heavy Pipe (CC >8.0) can execute non-FP32 instructions including FP16 and IMAD.

Given that “Cores” are not equivalent to CPU cores. Given these are instruction pipelines the request for memory utilization for these two “core” types is unclear.

veraj · April 16, 2024, 6:50am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I get the utilization of cuda core and tensor core respectively? Profiling Linux Targets	5	2768	January 10, 2023
Can you use nsight to see tensor core occupancy? Nsight Compute cudnn	4	902	March 23, 2024
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	504	November 22, 2022
Does Nsight compute provide timeline chart when running a kernel? Nsight Compute	10	780	January 17, 2024
How can I use PmSamling with ncu? Nsight Compute	2	365	June 28, 2024
How to measure Tensor FLOPs? CUDA Programming and Performance tensorrt , cuda , kernel	14	1572	May 15, 2024
How to get the exec. time inner the kernel function? Nsight Compute cuda , kernel , profiling	6	975	February 27, 2023
Understanding of Tensor Core, Cuda Core and other cores in Ampere architecture CUDA Programming and Performance tensorrt , cuda	8	3398	December 3, 2022
How can I profile both kernel and cuda APIs hardware usage and application total duration Nsight Compute	5	414	March 27, 2024
How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device? Jetson AGX Orin cuda , kernel	19	937	February 5, 2024

How to get Nsight Compute timeline of tensor cores and cuda cores?

Related topics