When I do a cudaMalloc of a few small blocks, I normally get addresses of device memory that are around 0x05100000 (Geforce 470 with CUDA 3.1 Windows). And, when I do a malloc, I get addresses of host memory that are around 0x00590000 (32-bit Windows 7). Now it’s hard to confuse pointers to these two addresses because the host CPU gets a segv if it tries to dereference a pointer to device memory. Likewise, the device gets a segv (or causes some kind of crash in a thread) if the device tries to dereference a pointer to host memory. This is good because it is easy to make the mistake in passing a host pointer to the device. It happens all the time, even if you use a variable naming convention, like “int * varh” vs. “int * vard”.
Unfortunately, I just found out that if I allocate big blocks on the device, eventually I find an overlap in the device and host memory addresses. And, I can create an example where it is possible to pass a host pointer to the device kernel but not cause a segv on that device. This is bad because I would like to prevent these errors. It also means that debuggers like Nsight and Ocelot most likely cannot automatically catch these errors because a segv won’t happen. The only way it seems to avoid the problem of confusing device and host pointers is to wrap the device pointer into a new type, or create a pointerless data type that works on both host and device. Any other alternatives you’ve implemented? Yep, C++ pointers suc’.
I was wondering, have other people noticed an overlap of device and host pointer addresses? Did it is cause a really bad failure that was hard to solve? On 64-bit systems, what are the typical memory addresses for device and host blocks? Thanks! (Stupid me, I’m still on 32-bit Windows and I have 4 G memory! I’m tired of having to reinstall Windows again and again, so I can’t test this myself for the moment.)