Hello,
I’m trying to build an app in CUDA that’ll allocate as much memory on the Graphic card and then work with it.
However, i did experience a strange behaviour. I’m allocating memory by chunks of 1MB, and i am looping a call to cudaMalloc until it fails. Once it fails, i have allocated as much memory as possible, so i can start to work with it.
It works perfectly on XP.
On Vista (32bits or 64bits, 8800GTS or 9800GTX, 177.92 or past drivers), i have the following behaviour :
Let’s say i allocate 1 MB of memory :
//init Cuda
CUT_DEVICE_INIT(argc,argv);
//One memory block of one MB
CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[0], MB));
//Set arbitrary value
set_value<<<1,1>>>(dev_mem[0],15,238);
CUDA_SAFE_CALL(cudaThreadSynchronize());
//read value in device memory
get_value<<<1,1>>>(dev_mem[0],15,device_result);
CUDA_SAFE_CALL(cudaThreadSynchronize());
//copy value from device to host
CUDA_SAFE_CALL( cudaMemcpy(&host_result, device_result, sizeof(myCuResult), cudaMemcpyDeviceToHost));
//display
printf("Value : %d\n",host_result.rValue);
I get the following output on both Vista and XP :
Using device 0: GeForce 9800 GTX/9800 GTX+
allocated 1 blocks
Value : 238
If I allocate as much memory as possible :
//init Cuda
CUT_DEVICE_INIT(argc,argv);
//allocate as many blocks as possible
while (CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[numBlocks], MB)) == cudaSuccess)
{
�  numBlocks++;
}
printf("allocated %d blocks\n",numBlocks);
//Set arbitrary value
set_value<<<1,1>>>(dev_mem[0],15,238);
CUDA_SAFE_CALL(cudaThreadSynchronize());
//read value in device memory
get_value<<<1,1>>>(dev_mem[0],15,device_result);
CUDA_SAFE_CALL(cudaThreadSynchronize());
//copy value from device to host
CUDA_SAFE_CALL( cudaMemcpy(&host_result, device_result, sizeof(myCuResult), cudaMemcpyDeviceToHost));
//display
printf("Value : %d\n",host_result.rValue);
I get an error on Vista :
Using device 0: GeForce 9800 GTX/9800 GTX+
allocated 469 blocks
Value : 0
but not on XP :
Using device 0: GeForce 9800 GTX/9800 GTX+
allocated 456 blocks
Value : 238
If i limit myself to 462 blocks instead of 469 on Vista, everything runs fine.
My guess is that cudaMalloc should fail earlier as some memory on the device has to be free for cuda to run. This was implement on XP, not on vista.
Any thoughts, help ? Things i’ve already tried :
-
do one big malloc of XMB instead of X mallocs of 1MB : same behaviour
-
check dlls and such : same
-
Vista 32bits instead of 64 : same
-
With or Without aero : same
-
older drivers : same
-
CUDA1.1 or 2.0 : same
-
a 8800GTS : same
-
cuMemGetInfo : free memory returned is always larger than what i can allocate (if i try to malloc the value returned : error)
-
initialize the device with CUT_INIT_DEVICE or cudaSetDevice() : same.
<img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />