device memory collision from cudaHostGetDevicePointer

device memory collison from cudaHostGetDevicePointer

Device memory address collision between cudaMalloc allocation and
cudaHostGetDevicePointer used to get the device address for some pinned host memory

  1. Did a
    HANDLE_ERROR( cudaMalloc( (void**)&dev_S , nScn * sizeof (struct Sdef)));
    HANDLE_ERROR( cudaMemcpy( dev_S, *S, nScn * sizeof (struct Sdef),cudaMemcpyHostToDevice ) );

to reserve some global device memory for some stuff.
printf(“%s:%ld Scn set (in %p ,dev global out %p)\n”, FILE,LINE, *S,dev_S);
tdgpu.cu:602 Scn set (in 0000000000670040 ,dev global out 000000FC00200000)
*****************

SO this dev_S is the address in the global memory space to use.

  1. THEN allcated some pinned host memory
    HANDLE_ERROR (cudaHostAlloc( (void**) &PB->P, sz2,cudaHostAllocMapped));

    and then
    HANDLE_ERROR( cudaHostGetDevicePointer(&PB->devP, PB->P, 0));
    got the device pointer to it in devP

    printf(“%s:%ld (P,O):(%p,%p)->(%p,%p) ok\n”, FILE,LINE, PB->P,PB->O, PB->devP,PB->devO);
    tdgpu.cu:214 (P,O):(0000000008B10000,000000000A1A0000)->(000000FC00200000,000000FC018A0000) ok
    ****************
    P O → devP, devO

WHAT!
the value I get in device address space for this P is the SAME as the address I got when I did the
cudaMalloc for my scenario’s both 000000FC00200000.

what driver version and toolkit are you using?

toolkit 3.2 RC2 driver ver 263.06 rev 3.2 v0.2.1221 is the cuda driver I think this is right.
I think one may have to do all the CuadHostGetDevicePointer’s you are going to do up front, before you do any cudaMalloc calls.

can you try a newer driver?

So your problem is solved?

From CUDA reference manual:

cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.

I am doing that and checking with canMapHostMemory and I am supposed to be able to map safely.
However the mapped value I get falls right on top of the device address obtained by cudaMalloc earlier.

Thanks very much for looking at this. I wish the problem was this simple.

host int oktomap(void)
{
cudaDeviceProp prop;
int whichDevice=0;
HANDLE_ERROR (cudaGetDevice( &whichDevice));
HANDLE_ERROR (cudaGetDeviceProperties( &prop,whichDevice));
if ( prop.canMapHostMemory != 1)
{
fprintf(stderr,“device cant map host memory\n”);
return(0);
}
return (1);
};

HANDLE_ERROR(cudaSetDevice(0));
HANDLE_ERROR( cudaSetDeviceFlags( cudaDeviceMapHost));
assert( oktomap() == 1);
printf(“%s:%ld ok to map host memory\n”, FILE,LINE);
~
~
~
~
~
~

I think I found the problem. One needs to do the cudaDeviceMapHost not only before doing the actual host mapping but also before any cudaMalloc calls as well. If you do cudamalloc calls
without first telling cuda you are going to map host memory, there is conflict. So its not enough to precede the cudaHostAlloc calls with the cudaSetDeviceflags(cudaDeviceMapHost) precede
all your device memory management stuff. I think this is what I did. When I put the set flags call up front of ALL cuda memory stuff, seems to work.