gtx295 global memory out of memory

hi , i have got a problem . gtx295
this is my code.

int count;
cudaGetDeviceCount(&count);
omp_set_num_threads(count);
#pragma omp parallel
{
int deviceIdx=omp_get_thread_num();
cudaSetDevice(deviceIdx);
int bytes=400000000;
float *d;
cudaMalloc((void **)&d,bytes);
printf(“(%d,%s)”,deviceIdx,cudaGetErrorString(cudaGetLastError()));
cudaFree(d);
}

i use openmp to start two threads,one thread apply for memory in one device。 

the problem is that, if bytes=400M, the output is (0,no error) (1,no error). but if bytes =500M .the result is (0,no error) (1,out of memory) why? can some one help me,thank you .

I am going to guess you have an active display running on the GPU which is running out of RAM. There is a driver API function cuMemGetInfo() which you can use to check the amount of free memory available on each device. You might be surprised how much a “modern” display manager (like WDDM + Aero, or Aqua or X11 + Compiz) uses.

thank you for your reply. but when i use driver api ,error occurs

include “cuda.h”

include “stdio.h”

main()

{

unsigned f,g;

cuMemGetInfo(f,g);

}

when i compile. can’t go through.

1.can not parse exterior sign _cuMemGetInfo@8

2.fatal error: a exterior command which is unable to parse

unsigned int free_mem,total_mem, used_mem;

	cuMemGetInfo( &free_mem, &total_mem );

	used_mem = total_mem-free_mem;

	printf("before plan3d:total mem: %0.3f MB, free: %0.3f MB, used : %0.3f MB\n",

		((double)total_mem)/1024.0/1024.0,

		((double)free_mem )/1024.0/1024.0, 

		((double)used_mem )/1024.0/1024.0 );

if you use Linux, you need to link with -lcuda.

if you use windows, then you need to link with C:/CUDA/lib64/cuda.lib (64-bit) or C:/CUDA/lib/cuda.lib (32-bit)

now i am sure that use openmp like that is not correct.the result is that the two threads apply for memory on the same device. but in sdk samples “cudaOpenmp” do like that. what should i do if i want to use two threads apply for memory on different device?