# How you allocated a matrix on device?

Hi, how the topic title, I want know how you allocated a matrix on device? For example:

#include <stdio.h>

// Kernel definition
global void MatAdd(float A[N][N], float B[N][N],float C[N][N])
{
C[i][j] = A[i][j] + B[i][j];
}

int main()
{
float **A, **B, **C;
int i,j;

``````A=(float**)malloc(10*sizeof(float*));
for(i=0; i<10; i++)
A[i] = (float*)malloc(10*sizeof(float));
B=(float**)malloc(10*sizeof(float*));
for(i=0; i<10; i++)
B[i] = (float*)malloc(10*sizeof(float));

//And C on device how allocated?

// Kernel invocation with one block of N * N * 1 threads
int numBlocks = 1;
``````

}

Thanks a lot!

Hello,

The simplest way is allocate the matrix in row-major order: http://en.wikipedia…Row-major_order

``````// size of the matrix

unsigned int sizem = N * N * sizeof(float);

// pointer to the matrix

float* A = NULL;

// allocate NxN element of type float

cudaMalloc( (void**) &A, sizem);

// The same for matrix B and C
``````

``````int numBlocks = 1;

``````

The kernel must index the data in row-major order

``````// Kernel definition

__global__ void MatAdd(float* A, float* B, float* C, N)

{

// map from threadIdx/BlockIdx to row-major order

int col = threadIdx.x + blockIdx.x * blockDim.x;

int row = threadIdx.y + blockIdx.y * blockDim.y;

// calculate the row-major index

int gid = row*N + col;

C[gid] = A[gid] + B[gid];

}
``````

Isn’t there a way to use the matrix with the structure a double ? Example:

// Kernel definition

global void MatAdd(float A[N][N], float B[N][N],float C[N][N])

{

C[i][j] = A[i][j] + B[i][j];

}

No. You will have to always access the matrix as mentioned before:

offset=j+i*N;

C[offset]=A[offset]+B[offset];

In order to allocate on device you just use the following:

float *A;

cudaMalloc((void**)&A, sizeof(float)NN);

So far is the only way. The line “float **A;” has different meaning in CUDA C than the usual 2D matrix.

Just note your access pattern is not coalesced. To improve performance, data in global memory must be accessed in a ‘coalesce way’.

``````int i = threadIdx.x;