cudaMallocPitch

Hello,

I’m trying to do what should be a simple task, but I want to do things the “right” way. I understand how to do these steps with small arrays, but I was having problems around 64k.

To provide some context, here is what I need to do. I just need help with steps 1, 3, and 4 and I should be able to figure out the rest.

  1. Allocate a two dimensional host array of dimensions 1024 by 1024.

  2. Assign values to array

  3. Allocate a two dimensional device array of dimensions 1024 by 1024.

  4. Copy host array to device array

  5. Execute Kernel

  6. Copy device array back to host array

Please, if anyone can help me at all it would be greatly appreciated.

Thanks,
Joe