Cuda operations along side Tensor operations

potatomash · May 28, 2020, 8:56pm

A100 operations

Is it possible to have Cuda and Tensor core operations simultaneously? Would it raise the amount of TFLOPS by a small portion?

Greg · May 28, 2020, 9:41pm

On Volta - Ampere architecture the SM consisted of 4 sub-partitions. Each sub-partition has a warp scheduler, register file, and execution units. The warp scheduler can dispatch 1 instruction per cycle. Tensor (*MMA) instructions are issued in 1 cycle. On the next cycle the warp scheduler can issue instructions to the FMA pipe (FP32, INT32), ALU pipe (INT, bit manipulation), XU (transcendental), LSU (SHMEM, global, local) or TEX unit. The following additional restrictions exist during 1-N cycles after a Tensor instructions:

On GV100 and GA100 FP64 math instructions cannot be issued
On GV100 and TU10x FP16x2 instructions cannot be issued

These cycles are generally used for post-processing or for address and data movement necessary to feed the Tensor cores.

Topic		Replies	Views
About the relationship between warp and tensor_core CUDA Programming and Performance	7	1364	July 7, 2023
Can a warp scheduler send instructions to tensor core and cuda core concurrently? CUDA Programming and Performance	2	35	May 5, 2025
Mma m8n8k4 on A100 CUDA Programming and Performance	10	134	November 14, 2024
Can CUDA Core and Tensor Core in one SM execute concurrently? CUDA Programming and Performance	1	829	August 25, 2023
Tensor core, is my analysis correct? CUDA Programming and Performance	2	60	February 5, 2025
Mma instructions on A100 CUDA Programming and Performance	5	142	October 1, 2024
Instruction Co-Issue on GK104 CUDA Programming and Performance	1	1690	June 20, 2012
Instruction scheduling in Ampere CUDA Programming and Performance	8	1541	October 12, 2021
Question on CTA Execution and Tensor Core Parallelism CUDA Programming and Performance	1	41	September 23, 2024
I need help understanding how concurrency of CUDA Cores and Tensor Cores works between Turing and Ampere/Ada? CUDA Programming and Performance cuda , tensorflow , rtx , ampere	10	1817	September 27, 2024

Cuda operations along side Tensor operations

Related topics