I continue got compiling error: "/nvidia/cutlass/include\cutlass/gemm/threadblock/mma_base.h(128): error : static assertion failed with “The pipelined structure requires at least two warp-level GEMM operations.”, can anyone help me up. I am using cuda 12.6 in MSVS. It is originl code from ada_f8_gemm with tyep float_e5m2_t for both ElementA and ElemntB, tfloat32_t for output, and compiled ok. But I tend to use Tfloat32_t in A or B (mixed precision); I tried a few combinations in GemmShape for threadBlock, warp, instruction, they all complained same error. How can I reolsve it…
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Cutlass cute-dsl error | 1 | 65 | July 31, 2025 | |
| CUTLASS: Division by Zero when using smaller threadtile sizes | 0 | 416 | May 15, 2019 | |
| Just Released: CUTLASS 3.8 | 1 | 385 | February 4, 2025 | |
| Where is cute's gemm code? | 20 | 2725 | October 13, 2024 | |
| CUTLASS 1-bit Tensorcore GEMM result error on SM86 | 0 | 625 | December 30, 2020 | |
| cuBLAS works with 11.2, but not with 11.3 on RTX 3080 Mobile. On A100 both work | 3 | 1459 | October 12, 2021 | |
| CUTLASS: Fast Linear Algebra in CUDA C++ | 13 | 2149 | September 9, 2024 | |
| gemm returns NaN in cuBLAS and cuSPARSE when ssse3 intel intrinsics are used | 7 | 1388 | August 25, 2016 | |
| CUTLASS Minimal Example - error: expression must have constant value | 2 | 855 | April 17, 2024 | |
| cublasGemmEx execution error code CUBLAS_STATUS_ARCH_MISMATCH | 1 | 1534 | January 7, 2020 |