Translating a 3D grid into array index

user366312 · April 9, 2023, 10:07pm

Suppose I want to translate the following C routine into a CUDA kernel.

And, I want to use all the dimensions in the grid to run the kernel.

How can I calculate the indices of the row and column of the matrix?

void OuterProduct(float* A, float* B, float** C, int N)
{
    for(int r=0 ; r<N ; r++)
    {
        for(int c=0 ; c<N ; c++)
        {
            for(int cc=0 ; cc<N ; cc++)
            {
                (*C)[r * N + c] += A[r * N + cc] * B[cc * N + c];
            }
        }
    }
}

The following is my understanding:

__global__ void MultiplyMatKernel(I* A, I* B, I* C, int N)
{
    int dimx = N;
    int dimy = N;
    int dimz = N;

    int r = blockIdx.x * blockDim.x + threadIdx.x;
    int c = blockIdx.y * blockDim.y + threadIdx.y;
    int d = blockIdx.z * blockDim.z + threadIdx.z;

    if (r < N && c < N && d < N) 
	{
        int loc_c = d * dimx * dimy + c * dimx + r;
 
        for (int cc=0; cc<N; cc++) 
		{
		    int loc_a = (cc * dimx * dimy) + (c * dimx) + r;
		    int loc_b = (d * dimx * dimy) + (cc * dimx) + r;
                    C[loc_c] += A[loc_a]*B[loc_b];
        }
    }
}

I this correct? I think not.

Can you give me the correct rationale for calculating loc_a, loc_b, and loc_c?

Robert_Crovella · April 10, 2023, 1:50pm

Topic		Replies	Views
Summation of 3D matrix in kernel CUDA Programming and Performance	1	385	December 29, 2023
Matrix Addition CUDA Programming and Performance	1	1130	June 14, 2012
3D threads index How do you index 3D threads?!!! CUDA Programming and Performance	2	4521	April 1, 2010
CUDA 2d Array Mapping CUDA Programming and Performance	1	3533	April 24, 2015
How can I calculate blocks per grid? CUDA Programming and Performance	3	810	April 10, 2023
Convert a Single Index to 2D or 3D Index? CUDA Programming and Performance	3	1154	December 9, 2009
CUDA kernel for ND inputs CUDA Programming and Performance	3	308	November 23, 2023
How to do Reduction in column for a matrix CUDA Programming and Performance	2	1546	May 3, 2019
CUDA Matrix Addition - 1D Memory, threads and blocks in 1D Matrix Addition in CUDA C using global m CUDA Programming and Performance	0	1073	November 26, 2011
How can I store intermediate results in a shared memory? CUDA Programming and Performance	3	460	August 11, 2023

Translating a 3D grid into array index

Related topics