Hello, I am new to CUDA and I have a question regarding the overlap of CUDA Cores (for ALU/FMA) and Tensor Cores. I have seen some posts suggesting that these two operations can be overlapped because they use different execution/hardware units. I am currently writing a CUDA program and I want to pi…

Thank you so much for the response! A few quick follow-up questions. My use case first applies some pre-preprocessing (through CUDA cores) before sending them to Tensor Cores. The workload between these could be different (e.g., the former is cheaper in general). Would warp specialization still be …

Overlapping CUDA Cores and Tensor Cores

Accelerated Computing CUDA CUDA Programming and Performance

Greg April 7, 2024, 6:31pm 2

It is possible to interleave CUDA Core (alu/fma) instructions with Tensor Core (mma) instructions; however, it is easier to have different warps on the SM sub-partition (warp scheduler) issuing CUDA Core instructions and a matrix multiply warp issuing the Tensor Core instructions. A single warp per sub-partition can be designed to reach 100% SOL of the Tensor Cores. See Cutlass documentation on Warp Specialization to understand this design pattern.

Topic		Replies	Views
How to Efficiently Pipeline CUDA Core and Tensor Core Workloads Across Warps for Maximum Throughput? CUDA Programming and Performance	3	173	August 20, 2025
How to overlap CUDA core and tensor core computing CUDA Programming and Performance	4	207	June 19, 2025
Use cuda core & tensor core at the same time CUDA Programming and Performance	6	913	September 29, 2024
Can a warp scheduler send instructions to tensor core and cuda core concurrently? CUDA Programming and Performance	2	127	May 5, 2025
Cuda operations along side Tensor operations CUDA Programming and Performance	2	554	October 12, 2021
About the relationship between warp and tensor_core CUDA Programming and Performance	7	1807	July 7, 2023
How to use cuda core and tensor core simultaneously？ GPU-Accelerated Libraries cuda	4	738	August 16, 2022
Tensor Cores Jetson AGX Xavier	8	1527	October 18, 2021
Tensor cores and CUDA cores work in parallel Video Codec, PyNv & OFA cuda	2	302	July 10, 2024
Can a program utilize all cores in the Volta GPU simultaneously CUDA Programming and Performance	1	674	September 5, 2018

Overlapping CUDA Cores and Tensor Cores

Related topics