gtx295 global memory out of memory

yhs1003 · October 22, 2009, 2:17pm

hi , i have got a problem . gtx295
this is my code.

int count;
cudaGetDeviceCount(&count);
omp_set_num_threads(count);
#pragma omp parallel
{
int deviceIdx=omp_get_thread_num();
cudaSetDevice(deviceIdx);
int bytes=400000000;
float *d;
cudaMalloc((void **)&d,bytes);
printf(“(%d,%s)”,deviceIdx,cudaGetErrorString(cudaGetLastError()));
cudaFree(d);
}

i use openmp to start two threadsï¼Œone thread apply for memory in one deviceã€‚

the problem is thatï¼Œ if bytes=400M, the output is (0,no error) (1,no error). but if bytes =500M .the result is (0,no error) (1,out of memory) why? can some one help me,thank you .

avidday · October 22, 2009, 2:55pm

I am going to guess you have an active display running on the GPU which is running out of RAM. There is a driver API function cuMemGetInfo() which you can use to check the amount of free memory available on each device. You might be surprised how much a “modern” display manager (like WDDM + Aero, or Aqua or X11 + Compiz) uses.

yhs1003 · October 23, 2009, 6:56am

thank you for your reply. but when i use driver api ,error occurs

include “cuda.h”

include “stdio.h”

main()

{

unsigned f,g;

cuMemGetInfo(f,g);

}

when i compile. can’t go through.

1.can not parse exterior sign _cuMemGetInfo@8

2.fatal error: a exterior command which is unable to parse

LSChien · October 23, 2009, 7:24am

unsigned int free_mem,total_mem, used_mem;

	cuMemGetInfo( &free_mem, &total_mem );

	used_mem = total_mem-free_mem;

	printf("before plan3d:total mem: %0.3f MB, free: %0.3f MB, used : %0.3f MB\n",

		((double)total_mem)/1024.0/1024.0,

		((double)free_mem )/1024.0/1024.0, 

		((double)used_mem )/1024.0/1024.0 );

if you use Linux, you need to link with -lcuda.

if you use windows, then you need to link with C:/CUDA/lib64/cuda.lib (64-bit) or C:/CUDA/lib/cuda.lib (32-bit)

yhs1003 · October 23, 2009, 10:45am

now i am sure that use openmp like that is not correct.the result is that the two threads apply for memory on the same device. but in sdk samples “cudaOpenmp” do like that. what should i do if i want to use two threads apply for memory on different device?