When writing GEMM on the latest Hopper architecture (e.g., CUTLASS 3), if we use expect_tx
with the barrier, do we still need to use try_wait
for waiting?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Mbarrier's expect-tx is set forever? Or just current phase? | 0 | 29 | August 27, 2024 | |
Query on PTX mbarrier.try_wait (parity/state) | 0 | 65 | August 27, 2024 | |
CUDA Kernel self-suspension ? Can a CUDA Kernel conditionally suspend its execution ? | 46 | 45257 | April 17, 2011 | |
Controlling context switching in CUDA | 16 | 4458 | April 29, 2013 | |
Cannot achieve parallel H2D copy and cuBLAS call | 3 | 1176 | July 28, 2018 | |
Is the phase bit in mbarrier test_wait intended for sequential control between tasks, like coordinating GEMM and Softmax? | 0 | 22 | November 5, 2024 | |
Possible race condition in TMA examples | 4 | 312 | August 7, 2024 | |
Block sync | 0 | 354 | November 25, 2020 | |
cuBLAS launch 5 times threads blocks more than expected | 4 | 454 | April 11, 2024 | |
cuda-memcheck synccheck tool detects unexpected barrier erros on Volta GPU | 8 | 1438 | February 9, 2020 |