Hey there, I’m new to CUDA and I’ve been reading the programming guide. There’s this thing that is used all throughout the guide and I cannot figure it out. For example, at pg. 31 (Matrix Manipulation with Shared Memory) they create a structure to represent an array (2D).
typedef struct{
int width;
int height;
int stride;
float* elements;
} Matrix;
Say you have the host array h_A and the device array d_A. After allocating host memory and populating the host array, they allocate memory on the device for enough floats and they do a cudaMemcpyHostToDevice( d_A.elements, h_A.elements …)
The kernel is started as kernelFunction <<< dimGrid , dimBlock >>> (d_A, …, …).
Q: How can the kernel know the values of d_A.width, d_A.height and d_A.stride if these were not copied to the device memory?
I’m sure I’m missing something big but it’s frustrating External Image
Cheers and thanks :rolleyes: