How can I get the utilization of cuda core and tensor core respectively?

lylyly6666 · January 9, 2023, 1:47pm

I want to see the comparision of utilization of cuda core and tensor core, so which tool should I use, nsight-system or nsight compute?
GPU: NVIDIA Tesla V100S-PCIE-32GB

For nsight-system, I have looked for nsight-system documentation, it seems that I can get the SM utilization (include tensor core utilization) through this command:

nsys profile --gpu-metrics-device

but there are some limits ( in official documentation) :

Nsight Systems GPU Metrics is only available for Linux targets on x86-64 and aarch64, and for Windows targets. It requires NVIDIA Turing architecture or newer.

Minimum required driver versions:

* NVIDIA Turing architecture TU10x, TU11x - r440
* NVIDIA Ampere architecture GA100 - r450
* NVIDIA Ampere architecture GA100 MIG - r470 TRD1
* NVIDIA Ampere architecture GA10x - r455

does it mean that I can’t use the --gpu-metrics-device to observe the tensor core utilization since my gpu is vlota achitecture? the follow is my command and result:

and when I use --gpu-metrics-device, I can’t get the application work, like this:

For nsight-compute, I can get the pipeline utilization, but it doesn’t change as time goes by. so I want to get the comparision of the utilization of cuda core and tensor core, like this :

I’m a fresher in this field, I really need help. what should I do? Any response would be greatly appreciated!

felix_dt · January 9, 2023, 1:54pm

Nsight Compute can give you many pipeline utilization metrics, including the ones you listed, but only on a per-kernel (or per-range, depending on the replay mode) level. It does not provide such information with time-correlated granularity.

lylyly6666 · January 9, 2023, 2:04pm

Really thanks for your help! that’s right, when i use nsight-compute, I can get pipeline utilization metrics, but the utilization of two cores over time cannot be obtained.

lylyly6666 · January 9, 2023, 2:13pm

So can i get such utilization of core with nsight-system?

nsys profile --gpu-metrics-device=0 ./my-app

with gpu: NVIDIA Tesla V100S-PCIE-32GB

Is it because of the GPU that i can’t run the command?

hwilper · January 9, 2023, 8:26pm

GPU metrics is only supported on Turing and newer in Nsys.

Greg · January 10, 2023, 1:09am

On Turing+ the General Metrics for XYZ set does not have the FP32 Utilization which would be CUDA Core Utilization. The Graphics Throuhgput Metrics for XYZ should have the FP32 Utilization which is logically equivalent to CUDA Cores.

The NVIDIA Streaming Multi-processors (SMs) have many different instruction execution pipes (fmalite, fmaheavy, alu, fma64, xu/sfu, lsu, tex, …). CUDA cores refers to the number of FP32 execution units (number of operations fma, fmalite, and fmaheavy) can perform per cycle. FP32 FLOPS and FP32 execution units is one of the common mechanisms to compare graphics focused GPUs. HPC comparisons use FP64 FLOPS and interference/training use mixed precision TOPS.

Topic		Replies	Views
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	504	November 22, 2022
How to get Nsight Compute timeline of tensor cores and cuda cores? Nsight Compute cuda , kernel	5	701	April 16, 2024
Tensor core metrics not showing up in NSight? Profiling Linux Targets pytorch	9	2763	May 18, 2024
Can you use nsight to see tensor core occupancy? Nsight Compute cudnn	4	902	March 23, 2024
Nsight Compute to measure metrics data Nsight Compute	1	522	January 29, 2021
Cannot get tensor core metrics with latest NSight system Profiling Linux Targets cuda , profiling	4	1410	June 20, 2023
How to measure Tensor FLOPs? CUDA Programming and Performance tensorrt , cuda , kernel	14	1569	May 15, 2024
How can I prevent my customized CUDA kernel function from using tensor cores on a Jetson Orin device? Jetson AGX Orin cuda , kernel	19	937	February 5, 2024
How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc TensorRT cudnn	4	1362	January 31, 2024
How to verify that Tensorflow w/ AMP is using tensor cores Nsight Compute	4	1658	June 6, 2019

How can I get the utilization of cuda core and tensor core respectively?

Related topics