Yikes! Device and host memory addresses can overlap.

When I do a cudaMalloc of a few small blocks, I normally get addresses of device memory that are around 0x05100000 (Geforce 470 with CUDA 3.1 Windows). And, when I do a malloc, I get addresses of host memory that are around 0x00590000 (32-bit Windows 7). Now it’s hard to confuse pointers to these two addresses because the host CPU gets a segv if it tries to dereference a pointer to device memory. Likewise, the device gets a segv (or causes some kind of crash in a thread) if the device tries to dereference a pointer to host memory. This is good because it is easy to make the mistake in passing a host pointer to the device. It happens all the time, even if you use a variable naming convention, like “int * varh” vs. “int * vard”.

Unfortunately, I just found out that if I allocate big blocks on the device, eventually I find an overlap in the device and host memory addresses. And, I can create an example where it is possible to pass a host pointer to the device kernel but not cause a segv on that device. This is bad because I would like to prevent these errors. It also means that debuggers like Nsight and Ocelot most likely cannot automatically catch these errors because a segv won’t happen. The only way it seems to avoid the problem of confusing device and host pointers is to wrap the device pointer into a new type, or create a pointerless data type that works on both host and device. Any other alternatives you’ve implemented? Yep, C++ pointers suc’.

I was wondering, have other people noticed an overlap of device and host pointer addresses? Did it is cause a really bad failure that was hard to solve? On 64-bit systems, what are the typical memory addresses for device and host blocks? Thanks! (Stupid me, I’m still on 32-bit Windows and I have 4 G memory! I’m tired of having to reinstall Windows again and again, so I can’t test this myself for the moment.)

Surely this is expected behaviour - the GPU and CPU each have their own (virtual) address spaces, so it’s completely acceptable for each to have pointers whose numerical values overlap. Right now, my workstation probably has active pointers which have identical numerical values to those on yours. Is that an error? I strongly suspect that Ocelot can catch these sort of errors, because it will do something a little smarter than just watching for a segfault - wrapping device pointers in a class, and letting the type system take care of things would be the obvious way forward.

It can, though it is for kind of a funny reason. Ocelot allocates memory in the host address space when you call cudaMalloc for emulator and CPU devices. It maintains a table containing all valid allocations that can be checked to determine out of bounds accesses. By virtue of being in the same process as the host application, the allocations returned by Ocelot’s cudaMalloc will never overlap with allocations returned by regular malloc.