How to solve memory allocation problem in cuda??

Fatal error: Failed to allocate device buffer. (out of memory at …/src/programname:linenumber

My 3D array is 20 X 200 X 200 and for each value in an array it returns 1331 outcomes (one for location and one for difference). Hence, I have to pass total 3 arrays to GPU of which one is of size 20 X 200 X 200 and other two are 20 X 200 X 200 X 1331.

So, I think this much memory allocation is not possible in GPU memory. So, is there any other way to handle this problem???

I have searched in the internet and couldn’t find any satisfactory solution. Here I would like to mention that I am using CUDA 6 version, Linux 64 bit operating system with 64 GB Ram support and 2TB hard disk support. I have checked memory status and it shows only 3% is in use. I have used CUDAMalloc function to allocate memory. But still this problem arises. So, anybody if could help me out in this regard will be very much helpful. Thanks in advance.

If we assume your arrays are 32bit floats one of your four dimensional matrices would require

20*200*200*1331*32bit = 4.3 GB

So it really depends on the hardware you are using. Newer Tesla cards have between 12-24 GB of memory, while older cards might only have 2-4GB.

The common approach to getting around this problem is to break it up into batches. If, as you say, you are simply looking for differences, you could pass in small sections of your array at a time and operate on each section individually.

If your dataset is too large to fit on the device, there isn’t a lot more you can do.