Question about api load_matrix_sync

half-0 · September 26, 2024, 11:29am

When I’m going to do some sparse convolution, I may need to manipulate some points that are not adjacent in the smem, what is the best way to load them into the matrixs that will be calculated by wmma operations? For example, if I have 128 points’ features in my smem, each point has 64 in_channels. I may need to load points 1, 3, 5, 7, 9…for a calculation. Should I just copy them to a continuous memory then use load_matrix_sync to load them, or just use realize this api by myself at mma level?

Topic		Replies	Views
Bank Conflicts When Using wmma::load_matrix in CUDA without Swizzle? CUDA Programming and Performance	0	138	September 12, 2024
Fastest Tiled WMMA for Matrices of Any Size? CUDA Programming and Performance	3	225	October 26, 2024
Oes nvcuda::wmma::load_matrix_sync Perform Implicit Type Conversion? CUDA Programming and Performance	3	85	September 11, 2024
Using store_matrix_sync with SMEM: bank conflict? CUDA Programming and Performance	3	103	September 25, 2024
Is loading the matrices in like this good practice for WMMA instructions in C++ CUDA? CUDA Programming and Performance cuda	0	36	December 30, 2024
Use vector load data from global mem to shm CUDA Programming and Performance kernel	1	226	April 5, 2024
How to Load 4 Consecutive Values from Shared Memory into uint MultiA for MMA? CUDA Programming and Performance	4	14	November 25, 2024
Padding of mma operation CUDA Programming and Performance	20	87	December 19, 2024
Load data for tensor core CUDA Programming and Performance	23	83	February 5, 2025
Find out more opportunities for accelerating SpMM using sparse tensor cores CUDA Programming and Performance cuda , kernel	5	471	March 24, 2024

Question about api load_matrix_sync

Related topics