Anyone can file a bug at any time to request document clarifications.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to use mma instructions to mul matrix in col major | 6 | 150 | December 7, 2024 | |
Understand the mma instruction in PTX | 5 | 905 | June 12, 2024 | |
What does the "row" and "col" mean in mma.sync.aligned.m16n8k4.row.col.f32.tf32.tf32.f32 | 0 | 243 | October 30, 2023 | |
Address out of bounds when using mma instructions | 8 | 184 | June 11, 2024 | |
Questions about mma instruction with Nvidia ptx | 1 | 128 | July 15, 2024 | |
Fastest Tiled WMMA for Matrices of Any Size? | 3 | 240 | October 26, 2024 | |
Clarification on the accumulator layout in an mma instruction | 2 | 277 | November 30, 2023 | |
Alignment requirement for the `ldmatrix` instruction | 3 | 108 | November 1, 2024 | |
The fragment layout of multiplicand A is not clear in mma.sp.sync.aligned.m16n8k32 when data type is fp16/bf16 | 2 | 656 | February 22, 2022 | |
Wmma load_mma_sync API | 1 | 773 | September 7, 2023 |