I am trying to execute cuSPARSELt Matmul with a larger matrix size (e.g., M,N,K >=10,000). I am using A100 with CUDA 11.7, cuSPARSELt 0.3.0.3. I followed the suggestion in this post (cuSparseLt problem) to use the official spmma2_example (CUDALibrarySamples/spmma2_example.cpp at master · NVIDIA/CUDALibrarySamples · GitHub) but still can execute only up to M=N=K=1024. A size larger than that will cause a segmentation fault.
Is there any instruction on how to execute larger-size matrices? (i.e., How to set up the bath size, or is there a specific constraint on maximal M,N,K?) More specifically, how can I reproduce the experiments of Figure 4-7 in this technical post (Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt | NVIDIA Technical Blog)?
Thanks!