Confusion whilst copying from host to device

I found an interesting error whilst attempting to copy data from a host array to a device array.

If the host and device arrays are declared to be of different extents i.e.



and then an attempt is made to copy the relevant host array elements to the device array i.e.


it seems that the cpu is incapable of putting the correct array elements in the correct device array elements on the gpu. All the array elements get muddled up resulting in a segfault or WORSE!

Is there some restriction here that I’m missing or is this a compiler problem? Presumably it’s because of the conversion to C when the CUDA memcpy function is called in the background which makes me think there might be a way around it???

I tried to recreate your example in a small test program, however, I was never able to get the example code to fail. I used a number of different compiler(11.1 - 12.5) and they all worked correctly. If you have a failing case that you can post, and what version of the compiler used, that would be very helpful.

Hi sorry for the late reply. I managed to fix the problem… definitely my fault. I had a very well disguised mapping bug but it’s fixed now and the copy from host to device works a treat.

