CudaMalloc on Vista : strange behaviour Works on XP, Fails on Vista

Tetedeiench · September 23, 2008, 9:14pm

Hello,

I’m trying to build an app in CUDA that’ll allocate as much memory on the Graphic card and then work with it.

However, i did experience a strange behaviour. I’m allocating memory by chunks of 1MB, and i am looping a call to cudaMalloc until it fails. Once it fails, i have allocated as much memory as possible, so i can start to work with it.

It works perfectly on XP.

On Vista (32bits or 64bits, 8800GTS or 9800GTX, 177.92 or past drivers), i have the following behaviour :

Let’s say i allocate 1 MB of memory :

//init Cuda

CUT_DEVICE_INIT(argc,argv);

//One memory block of one MB

CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[0], MB));

//Set arbitrary value

set_value<<<1,1>>>(dev_mem[0],15,238);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//read value in device memory

get_value<<<1,1>>>(dev_mem[0],15,device_result);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//copy value from device to host

CUDA_SAFE_CALL( cudaMemcpy(&host_result, device_result, sizeof(myCuResult), cudaMemcpyDeviceToHost));

//display

printf("Value : %d\n",host_result.rValue);

I get the following output on both Vista and XP :

Using device 0: GeForce 9800 GTX/9800 GTX+

allocated 1 blocks

Value : 238

If I allocate as much memory as possible :

//init Cuda

CUT_DEVICE_INIT(argc,argv);

//allocate as many blocks as possible

while (CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[numBlocks], MB)) == cudaSuccess)

	{

 ï¿½ Â numBlocks++;

	}

printf("allocated %d blocks\n",numBlocks);

//Set arbitrary value

set_value<<<1,1>>>(dev_mem[0],15,238);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//read value in device memory

get_value<<<1,1>>>(dev_mem[0],15,device_result);

CUDA_SAFE_CALL(cudaThreadSynchronize());

//copy value from device to host

CUDA_SAFE_CALL( cudaMemcpy(&host_result, device_result, sizeof(myCuResult), cudaMemcpyDeviceToHost));

//display

printf("Value : %d\n",host_result.rValue);

I get an error on Vista :

Using device 0: GeForce 9800 GTX/9800 GTX+

allocated 469 blocks

Value : 0

but not on XP :

Using device 0: GeForce 9800 GTX/9800 GTX+

allocated 456 blocks

Value : 238

If i limit myself to 462 blocks instead of 469 on Vista, everything runs fine.

My guess is that cudaMalloc should fail earlier as some memory on the device has to be free for cuda to run. This was implement on XP, not on vista.

Any thoughts, help ? Things i’ve already tried :

do one big malloc of XMB instead of X mallocs of 1MB : same behaviour
check dlls and such : same
Vista 32bits instead of 64 : same
With or Without aero : same
older drivers : same
CUDA1.1 or 2.0 : same
a 8800GTS : same
cuMemGetInfo : free memory returned is always larger than what i can allocate (if i try to malloc the value returned : error)
initialize the device with CUT_INIT_DEVICE or cudaSetDevice() : same.

<img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

Tetedeiench · September 23, 2008, 9:33pm

I just discovered a new thing : If i allocate all available memory, then free up some, it still fails on Vista :

while (CUDA_SAFE_CALL(cudaMalloc( Â (void**) &dev_mem[numBlocks], MB)) == cudaSuccess)

{

 Â numBlocks++;

}

//free 100 blocks

for (int i=numBlocks;i>numBlocks-100;i--)

	cudaFree(dev_mem[numBlocks]);

numBlocks-=100;

==> Fails, it prints 0.

This means that allocating too much memory on a Vista programs will lead to a crash, even if you free up some afterwards.

The more i run into it, the more i think it’s a driver bug External Image

E.D_Riedijk · September 24, 2008, 4:31am

I believe you experience a known bug that will be fixed in the next driver release according to NVIDIA.

Isn’t it easier to request how much mem is free, and then allocate that amount???

Tetedeiench · September 24, 2008, 6:09am

Not really, as cuGetMemInfo always returns more free memory than there is in reality, both under vista and XP this time.

So i’ll get one failure for sure, and that means a crash in vista.

Is this bug documented somewhere ? If you say it’s a known bug… where did you read about it from ? I haven’t found a workaround but to leave X% of the memory free under Vista, which kills the purpose of my program External Media

E.D_Riedijk · September 24, 2008, 5:44pm

there is a thread on the forum that is about allocating and deallocating memoory in a loop. There it was acknowledged as a bug. Not 100% sure it is the same bug, but sounds the same.

Tetedeiench · September 24, 2008, 7:53pm

Thanks for your help, but it isn’t the same problem. I read quite a bit before i posted, and people over there could get around the bug by allocating the memory with one big malloc at the beginning.

I try to do that, but i want to eat up all available memory, and no functions does that exactly, as it only returns an estimate.

That is, if i run something like :

cuMemGetInfo(&free, &total);

cudaMalloc(&dev_mem,free);

I get an error, as the malloc will fail. That happens on XP and vista. It works if i do the malloc about 3 to 4MB below the value returned.

I did this as a workaround (i.e. i do a malloc in a loop, using 1MB chunks, and i stop at 97% of the free memory amount returned by cudaMemGetInfo), but as i want to be able to plow through the memory as a whole, it inpacts my program External Media

I’ll wait to be registered as a CUDA developer and submit this as a bug, as i was told by email.

nksureshkumar · July 1, 2009, 12:00am

Thanks for your help, but it isn’t the same problem. I read quite a bit before i posted, and people over there could get around the bug by allocating the memory with one big malloc at the beginning.

I try to do that, but i want to eat up all available memory, and no functions does that exactly, as it only returns an estimate.

That is, if i run something like :
cuMemGetInfo(&free, &total);

cudaMalloc(&dev_mem,free);
I get an error, as the malloc will fail. That happens on XP and vista. It works if i do the malloc about 3 to 4MB below the value returned.

I did this as a workaround (i.e. i do a malloc in a loop, using 1MB chunks, and i stop at 97% of the free memory amount returned by cudaMemGetInfo), but as i want to be able to plow through the memory as a whole, it inpacts my program External Media

I’ll wait to be registered as a CUDA developer and submit this as a bug, as i was told by email.

Did you have any luck with this issue ? I’m trying to do exactly the same i.e. allocate the entire memory. What was your final solution to this problem.

Regards,

Kumar

Topic		Replies	Views
cudaMalloc error in big loop CUDA Programming and Performance	12	15577	May 21, 2008
Cuda Out of Memory with tons of memory left? CUDA Programming and Performance	5	38923	December 23, 2009
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37633	April 23, 2009
cudaMalloc failed with unknown error after only 491656bytes CUDA Programming and Performance	9	4361	July 2, 2009
cudaMalloc failure CUDA Programming and Performance	5	28352	June 22, 2008
Cannot allocate "all" memory? cudaMalloc fails with 50MB memory left.. CUDA Programming and Performance	9	9567	July 15, 2008
amount of pinned memory CUDA Programming and Performance	17	12280	December 4, 2008
cudaMalloc fails after CUDA Programming and Performance	6	7682	June 18, 2012
cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit CUDA Programming and Performance	7	7455	June 28, 2011
cudaFree is returning an unrecognised error code CUDA Programming and Performance	10	7887	March 13, 2009

CudaMalloc on Vista : strange behaviour Works on XP, Fails on Vista

Related Topics