Parallel execution on tensor cores and cuda cores on the same SM

yao.rong · September 17, 2021, 8:56pm

On Xavier AGX, is there a way to check/confirm that tensor cores and cuda cores can be executed in parallel on the same SM?

I have done quite some research online, including reading this post: Run Parallel Tensor Cores GEMM and Cuda GEMM - #8 by mnicely

I also used a dummy program that creates multiple streams, some of them running GEMM on tensor cores and the rest of them running on cuda cores(non-MMA OPs). I see some overlap between the two kinds of streams but I cannot confirm whether the overlapping streams run on the same SM or not since there’s no such information from nvprof+nvvp.

Can someone from Nvidia help confirm whether this is possible? If so, how do I explicitly program the GPUs to run with more parallelism?

mnicely · September 20, 2021, 1:33pm

Nsight Systems should provide the insight you’re looking for.

If so, how do I explicitly program the GPUs to run with more parallelism?

Please don’t try to do this. The hardware will do a better job scheduling than programmatically. If streams don’t overlap it’s usually due to lack of resources.

yao.rong · September 21, 2021, 8:27pm

Nsight Systems should provide the insight you’re looking for.

I was using nvprof + nvvp. I wasn’t able to find the information such as which stream runs on which SM. Also, can you please provide any pointer on the way to explicitly control which stream works on which SM’s tensor/cuda core?

The reason that we’re asking this question is because we wanted to confirm the ability to run parallel computing between cuda and tensor cores for future computing resource planning purposes. We didn’t intend to explicitly program things in production.

mnicely · September 21, 2021, 10:29pm

Also, can you please provide any pointer on the way to explicitly control which stream works on which SM’s tensor/cuda core

This not possible. The closest thing to controlling what is launched on particular number of SMs is cublasSetSmCountTarget

Topic		Replies	Views
CUDA cores vs Tensor Cores Jetson AGX Xavier cuda , nvbugs	16	4986	October 18, 2021
Concurrent execution of CUDA and Tensor cores CUDA Programming and Performance	34	9226	November 3, 2024
Multiple Streams on Tensor Cores CUDA Programming and Performance	4	725	February 14, 2019
Run Parallel Tensor Cores GEMM and Cuda GEMM GPU-Accelerated Libraries cuda , cublas	9	2632	August 14, 2022
Tensor cores and CUDA cores work in parallel Video Processing & Optical Flow cuda	2	260	July 10, 2024
Can CUDA Core and Tensor Core in one SM execute concurrently? CUDA Programming and Performance	1	932	August 25, 2023
Tensor Cores Jetson AGX Xavier	8	1476	October 18, 2021
Switch between cuda core and tensor core TensorRT tensorrt	1	619	April 13, 2021
nvidia-smi and exclusive compute mode Jetson TX2	10	5742	October 18, 2021
How to overlap CUDA core and tensor core computing CUDA Programming and Performance	4	128	June 19, 2025

Parallel execution on tensor cores and cuda cores on the same SM

Related topics