In the “wgmma.mma_async.sp”,there is a modifer “sp-meta”. Like a sparse Matrix in picture, the “sp-mate” is different in different threads? How did the 32 threads get the right metadata?
Hi there @yunhsky and welcome to the NVIDIA developer forums.
I don’t have a clear idea what you are referring to here, but it seems to be CUDA related if I am not mistaken. So I hope you don’t mind me moving this post over to the CUDA programming category.
Thanks!
Did you check the PTX documentation on sparse matrix storage and their corresponding fragments?
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-sparse-matrix-storage