2D matrix through 1D array


I am learning cuda and trying to implement a 2D matrix. However, my code doesn’t work so far and I couldn’t figure out the problem. So, I am posting my code here hoping to get some help.

The code is simple: it initiate a matrix and assign each element’s value according to its thread ID. The code compiles fine by hang the system whenever I run it.


#include "./common/book.h"

#define WIDTH	10

#define HEIGHT	10

typedef struct


	float *data;

	int *width;

} matrixStruct;

__device__ void SetMatElement(matrixStruct *mat, int x, int y, float val)


	int width;

	width = *(mat->width);

	*(mat->data+y*width+x) = val;


__global__ void SetMat(matrixStruct *mat)


	int x = threadIdx.x+blockIdx.x*blockDim.x;

	int y = threadIdx.y+blockIdx.y*blockDim.y;

	int offset = x+y*blockDim.x*gridDim.x;

	SetMatElement(mat, x, y, (float)offset);


int main(int argc, char *argv[])


	matrixStruct *mat;

	int width;

	float *data;

	data = (float*)malloc(10*10*sizeof(float));

	for(int i = 0; i < WIDTH*HEIGHT; i++)


		data[i] = 0.0f;


	width = WIDTH;

	HANDLE_ERROR(cudaMalloc((void**)&mat, sizeof(matrixStruct)));

	HANDLE_ERROR(cudaMalloc((void**)&(mat->data), WIDTH*HEIGHT*sizeof(float)));

	HANDLE_ERROR(cudaMemcpy(mat->data, data, WIDTH*HEIGHT*sizeof(float), cudaMemcpyHostToDevice));

	HANDLE_ERROR(cudaMalloc((void**)&(mat->width), sizeof(float)));

	HANDLE_ERROR(cudaMemcpy(mat->width, &width, sizeof(int), cudaMemcpyHostToDevice));

	dim3 grids(WIDTH/2,HEIGHT/2);

	dim3 threads(2,2);



	HANDLE_ERROR(cudaMemcpy(&width, mat->width, sizeof(int), cudaMemcpyDeviceToHost));

	printf("width = %d\n", width);

	HANDLE_ERROR(cudaMemcpy(data, mat->data, WIDTH*HEIGHT*sizeof(float), cudaMemcpyDeviceToHost));

	for(int i = 0; i < WIDTH; i++)


		printf("data[%d] = %f\n",i,data[i]);









You can’t access device memory from host code. Thus for mat malloced on the device mat->data can’t be modified in your main.

And before someone explains how to get this to work I’ll suggest a better alternative instead:


typedef struct


        float *data;

        int width;

} matrixStruct;

and set it up via

matrixStruct mat;

mat.width = WIDTH;

HANDLE_ERROR(cudaMalloc((void**)&(mat.data), WIDTH*HEIGHT*sizeof(float)));

and pass it directly as a parameter instead of by it’s pointer

__global__ void SetMat(matrixStruct mat)

Thanks! The solution actually works.

However, I still have the following confusion:

My understanding is that the variable ‘mat’ and ‘mat.width’ live in host memory. Can the function that runs on device access mat.width? Thanks.


Kernel parameters are automatically copied to the device. (Specifically to shared (on G80) or constant memory (on Fermi).)
Thus mat.width will also be a device copy of the host’s mat.width. (And of course any changes of the device copy won’t be reflected in the host original.)

Thanks! That is insightful!