Question about host-device communications

Hey there, I’m new to CUDA and I’ve been reading the programming guide. There’s this thing that is used all throughout the guide and I cannot figure it out. For example, at pg. 31 (Matrix Manipulation with Shared Memory) they create a structure to represent an array (2D).

typedef struct{

	int width;

	int height;

	int stride;

	float* elements;

} Matrix;

Say you have the host array h_A and the device array d_A. After allocating host memory and populating the host array, they allocate memory on the device for enough floats and they do a cudaMemcpyHostToDevice( d_A.elements, h_A.elements …)

The kernel is started as kernelFunction <<< dimGrid , dimBlock >>> (d_A, …, …).

Q: How can the kernel know the values of d_A.width, d_A.height and d_A.stride if these were not copied to the device memory?

I’m sure I’m missing something big but it’s frustrating External Image

Cheers and thanks :rolleyes:

The GPU kernel call internally copies the contests of d_A to the GPU before the kernel is launched. So width, height, stride, and the pointer elements are all copied to the GPU and stuffed into shared memory for use in the kernel.

Thank you, that makes it so much more clear.