CudaMalloc on Vista : strange behaviour Works on XP, Fails on Vista

Hello,

I’m trying to build an app in CUDA that’ll allocate as much memory on the Graphic card and then work with it.

However, i did experience a strange behaviour. I’m allocating memory by chunks of 1MB, and i am looping a call to cudaMalloc until it fails. Once it fails, i have allocated as much memory as possible, so i can start to work with it.

It works perfectly on XP.

On Vista (32bits or 64bits, 8800GTS or 9800GTX, 177.92 or past drivers), i have the following behaviour :

Let’s say i allocate 1 MB of memory :

//init Cuda

CUT_DEVICE_INIT(argc,argv);

//One memory block of one MB

CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[0], MB));

//Set arbitrary value

set_value<<<1,1>>>(dev_mem[0],15,238);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//read value in device memory

get_value<<<1,1>>>(dev_mem[0],15,device_result);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//copy value from device to host

CUDA_SAFE_CALL( cudaMemcpy(&host_result, device_result, sizeof(myCuResult), cudaMemcpyDeviceToHost));

//display

printf("Value : %d\n",host_result.rValue);

I get the following output on both Vista and XP :

Using device 0: GeForce 9800 GTX/9800 GTX+

allocated 1 blocks

Value : 238

If I allocate as much memory as possible :

//init Cuda

CUT_DEVICE_INIT(argc,argv);

//allocate as many blocks as possible

while (CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[numBlocks], MB)) == cudaSuccess)

	{

 �  numBlocks++;

	}

printf("allocated %d blocks\n",numBlocks);

//Set arbitrary value

set_value<<<1,1>>>(dev_mem[0],15,238);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//read value in device memory

get_value<<<1,1>>>(dev_mem[0],15,device_result);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//copy value from device to host

CUDA_SAFE_CALL( cudaMemcpy(&host_result, device_result, sizeof(myCuResult), cudaMemcpyDeviceToHost));

//display

printf("Value : %d\n",host_result.rValue);

I get an error on Vista :

Using device 0: GeForce 9800 GTX/9800 GTX+

allocated 469 blocks

Value : 0

but not on XP :

Using device 0: GeForce 9800 GTX/9800 GTX+

allocated 456 blocks

Value : 238

If i limit myself to 462 blocks instead of 469 on Vista, everything runs fine.

My guess is that cudaMalloc should fail earlier as some memory on the device has to be free for cuda to run. This was implement on XP, not on vista.

Any thoughts, help ? Things i’ve already tried :

  • do one big malloc of XMB instead of X mallocs of 1MB : same behaviour

  • check dlls and such : same

  • Vista 32bits instead of 64 : same

  • With or Without aero : same

  • older drivers : same

  • CUDA1.1 or 2.0 : same

  • a 8800GTS : same

  • cuMemGetInfo : free memory returned is always larger than what i can allocate (if i try to malloc the value returned : error)

  • initialize the device with CUT_INIT_DEVICE or cudaSetDevice() : same.

<img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=’:’(’ />

I just discovered a new thing : If i allocate all available memory, then free up some, it still fails on Vista :

while (CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[numBlocks], MB)) == cudaSuccess)

{

 Â numBlocks++;

}

//free 100 blocks

for (int i=numBlocks;i>numBlocks-100;i--)

	cudaFree(dev_mem[numBlocks]);

numBlocks-=100;

==> Fails, it prints 0.

This means that allocating too much memory on a Vista programs will lead to a crash, even if you free up some afterwards.

The more i run into it, the more i think it’s a driver bug :/

I believe you experience a known bug that will be fixed in the next driver release according to NVIDIA.

Isn’t it easier to request how much mem is free, and then allocate that amount???

Not really, as cuGetMemInfo always returns more free memory than there is in reality, both under vista and XP this time.

So i’ll get one failure for sure, and that means a crash in vista.

Is this bug documented somewhere ? If you say it’s a known bug… where did you read about it from ? I haven’t found a workaround but to leave X% of the memory free under Vista, which kills the purpose of my program :/

there is a thread on the forum that is about allocating and deallocating memoory in a loop. There it was acknowledged as a bug. Not 100% sure it is the same bug, but sounds the same.

Thanks for your help, but it isn’t the same problem. I read quite a bit before i posted, and people over there could get around the bug by allocating the memory with one big malloc at the beginning.

I try to do that, but i want to eat up all available memory, and no functions does that exactly, as it only returns an estimate.

That is, if i run something like :

cuMemGetInfo(&free, &total);

cudaMalloc(&dev_mem,free);

I get an error, as the malloc will fail. That happens on XP and vista. It works if i do the malloc about 3 to 4MB below the value returned.

I did this as a workaround (i.e. i do a malloc in a loop, using 1MB chunks, and i stop at 97% of the free memory amount returned by cudaMemGetInfo), but as i want to be able to plow through the memory as a whole, it inpacts my program :/

I’ll wait to be registered as a CUDA developer and submit this as a bug, as i was told by email.

Did you have any luck with this issue ? I’m trying to do exactly the same i.e. allocate the entire memory. What was your final solution to this problem.

Regards,

Kumar