Cuda tiling in 3D grids and 3D blocks with shared memory

shagohod2000 · January 21, 2022, 3:03pm

Hi, I am a beginner of cuda, I want to apply cuda in a matrix-matrix multiplication, the algorithm to be optimized is as follows,
for (int a = 0; a < N; a++)
for (int b = 0; b < N; b++)
for (int c = 0; c < N; c++)
for (int d = 0; d < N; d++)
sum[a][b][c] = sum[a][b][c] + A[a][b][d] * C[d][c];
but I can’t use it very well, I try to use tiling and shared memory in the code, below is the cuda code that I have written,

but I am not quite sure if this is correct. Can someone help me?

achartiernv · January 25, 2022, 5:18pm

Hello, this forum is dedicated to discussions related to using the sanitizer tools and API.
Questions related to CUDA can be raised at CUDA - NVIDIA Developer Forums

Topic		Replies	Views
A Question from Programming Massively Parallel Processors: A Hands-on Approach CUDA Programming and Performance cuda , kernel	0	633	September 28, 2021
Matrix Multiplication with Shared Memory CUDA Programming and Performance	0	1346	September 28, 2009
Optimize problem regarding problem size CUDA Programming and Performance	4	6128	May 25, 2011
Using more shared memory does not show improvement CUDA Programming and Performance	0	358	November 18, 2020
Tiled partitioning CUDA Programming and Performance	0	2061	June 24, 2011
Example of Matrix multiplication CUDA Programming and Performance	1	1076	February 26, 2010
How to improve performance when multiply two matrices with large data in CUDA ? CUDA Programming and Performance	5	3925	March 19, 2014
nVidia CUDA Programming Guide and shared memory CUDA Programming and Performance	0	1462	January 12, 2010
matrix multiplication for large matrices CUDA Programming and Performance	3	1580	August 22, 2011
Shared Memory Access - Matrix Multiplication CUDA Programming and Performance	1	1038	October 24, 2015

Cuda tiling in 3D grids and 3D blocks with shared memory

Related topics