Yeah, this is pretty fundamental to CUDA. Device memory is on the graphics card, and separated by a PCI-Express bus from the CPU, so you have explicitly copy data to or from the device when you need it. All other operations on device memory happen in global functions.
i understood the need to copy data from host to device from all the samples i saw, but i thought i could put the device code in the same function…
maybe the people who write the programming guide could put this explanation there?
or if it is there, point me to the right place, because i didn’t find it …
This is the first relevant quote I found, section 4.2.2.4 (in CUDA 2.0 guide, not sure what section # it is in earlier guides):
“Dereferencing a pointer either to global or shared memory in code that is executed
on the host or to host memory in code that is executed on the device results in an
undefined behavior, most often in a segmentation fault and application termination.”