Cutlass usage for RTX4070 (ada)

mingomit · November 5, 2024, 7:16am

I continue got compiling error: "/nvidia/cutlass/include\cutlass/gemm/threadblock/mma_base.h(128): error : static assertion failed with “The pipelined structure requires at least two warp-level GEMM operations.”, can anyone help me up. I am using cuda 12.6 in MSVS. It is originl code from ada_f8_gemm with tyep float_e5m2_t for both ElementA and ElemntB, tfloat32_t for output, and compiled ok. But I tend to use Tfloat32_t in A or B (mixed precision); I tried a few combinations in GemmShape for threadBlock, warp, instruction, they all complained same error. How can I reolsve it…

Topic		Replies	Views
Cutlass cute-dsl error TensorRT cuda , kernel	1	65	July 31, 2025
CUTLASS: Division by Zero when using smaller threadtile sizes GPU-Accelerated Libraries	0	416	May 15, 2019
Just Released: CUTLASS 3.8 Technical Blog	1	385	February 4, 2025
Where is cute's gemm code? CUDA Programming and Performance	20	2725	October 13, 2024
CUTLASS 1-bit Tensorcore GEMM result error on SM86 CUDA Developer Tools	0	625	December 30, 2020
cuBLAS works with 11.2, but not with 11.3 on RTX 3080 Mobile. On A100 both work GPU-Accelerated Libraries cublas	3	1459	October 12, 2021
CUTLASS: Fast Linear Algebra in CUDA C++ Technical Blog	13	2149	September 9, 2024
gemm returns NaN in cuBLAS and cuSPARSE when ssse3 intel intrinsics are used GPU-Accelerated Libraries	7	1388	August 25, 2016
CUTLASS Minimal Example - error: expression must have constant value GPU-Accelerated Libraries pytorch , compile	2	855	April 17, 2024
cublasGemmEx execution error code CUBLAS_STATUS_ARCH_MISMATCH GPU-Accelerated Libraries	1	1534	January 7, 2020

Cutlass usage for RTX4070 (ada)

Related topics