It is possible to interleave CUDA Core (alu/fma) instructions with Tensor Core (mma) instructions; however, it is easier to have different warps on the SM sub-partition (warp scheduler) issuing CUDA Core instructions and a matrix multiply warp issuing the Tensor Core instructions. A single warp per sub-partition can be designed to reach 100% SOL of the Tensor Cores. See Cutlass documentation on Warp Specialization to understand this design pattern.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| How to Efficiently Pipeline CUDA Core and Tensor Core Workloads Across Warps for Maximum Throughput? | 3 | 145 | August 20, 2025 | |
| How to overlap CUDA core and tensor core computing | 4 | 185 | June 19, 2025 | |
| Use cuda core & tensor core at the same time | 6 | 856 | September 29, 2024 | |
| Can a warp scheduler send instructions to tensor core and cuda core concurrently? | 2 | 116 | May 5, 2025 | |
| Cuda operations along side Tensor operations | 2 | 544 | October 12, 2021 | |
| About the relationship between warp and tensor_core | 7 | 1748 | July 7, 2023 | |
| How to use cuda core and tensor core simultaneously? | 4 | 728 | August 16, 2022 | |
| Tensor Cores | 8 | 1507 | October 18, 2021 | |
| Tensor cores and CUDA cores work in parallel | 2 | 289 | July 10, 2024 | |
| Can a program utilize all cores in the Volta GPU simultaneously | 1 | 667 | September 5, 2018 |