to reserve some global device memory for some stuff.
printf(“%s:%ld Scn set (in %p ,dev global out %p)\n”, FILE,LINE, *S,dev_S);
tdgpu.cu:602 Scn set (in 0000000000670040 ,dev global out 000000FC00200000)
*****************
SO this dev_S is the address in the global memory space to use.
THEN allcated some pinned host memory
HANDLE_ERROR (cudaHostAlloc( (void**) &PB->P, sz2,cudaHostAllocMapped));
and then
HANDLE_ERROR( cudaHostGetDevicePointer(&PB->devP, PB->P, 0));
got the device pointer to it in devP
printf(“%s:%ld (P,O):(%p,%p)->(%p,%p) ok\n”, FILE,LINE, PB->P,PB->O, PB->devP,PB->devO);
tdgpu.cu:214 (P,O):(0000000008B10000,000000000A1A0000)->(000000FC00200000,000000FC018A0000) ok
****************
P O → devP, devO
WHAT!
the value I get in device address space for this P is the SAME as the address I got when I did the
cudaMalloc for my scenario’s both 000000FC00200000.
toolkit 3.2 RC2 driver ver 263.06 rev 3.2 v0.2.1221 is the cuda driver I think this is right.
I think one may have to do all the CuadHostGetDevicePointer’s you are going to do up front, before you do any cudaMalloc calls.
I am doing that and checking with canMapHostMemory and I am supposed to be able to map safely.
However the mapped value I get falls right on top of the device address obtained by cudaMalloc earlier.
Thanks very much for looking at this. I wish the problem was this simple.
host int oktomap(void)
{
cudaDeviceProp prop;
int whichDevice=0;
HANDLE_ERROR (cudaGetDevice( &whichDevice));
HANDLE_ERROR (cudaGetDeviceProperties( &prop,whichDevice));
if ( prop.canMapHostMemory != 1)
{
fprintf(stderr,“device cant map host memory\n”);
return(0);
}
return (1);
};
I think I found the problem. One needs to do the cudaDeviceMapHost not only before doing the actual host mapping but also before any cudaMalloc calls as well. If you do cudamalloc calls
without first telling cuda you are going to map host memory, there is conflict. So its not enough to precede the cudaHostAlloc calls with the cudaSetDeviceflags(cudaDeviceMapHost) precede
all your device memory management stuff. I think this is what I did. When I put the set flags call up front of ALL cuda memory stuff, seems to work.