cuda code fails with outof memory

I am completely new to CUDA. I am working on x86_64 RHEL4.3 Machine with-
Xeon E5462 2.8G Quad core (2 no’s)
8(2*4)GB DDR2 Memory
NVIDIA Quadro FX 570 GPU (16 cores)
I am using CUDA 2.1 and compatible SDK 2.1

  1. Initial testing in ~/NVIDIA_CUDA_SDK/bin/linux/release using standard exe files

./deviceQuery and ./bandwidthTest ./particles ./oceanFFT etc all work fine.

Some fail-
./transpose ./matrixMul etc
cudaSafeCall() Runtime API error in file <>, line 113 : out of memory.

  1. A .cu code fails while execution (after successful compilation) with following error

NUmber of Devices : 1

Using device 0: Quadro FX 570
ERROR at line :87.2’ ’ out of memory

The transpose example need 384x384*sizeof(float) bytes, that is, 589824 bytes (576 Mbytes).

NVIDIA Quadro FX 570 comes with 256 or 512 MBytes of Global memory (not sure). Anyway, the example has not enough memory. You can try down the problem size to 256x256.

Hope this help and welcome to CUDA!


That is right

Quadro FX 570 features 256 MB global memory. This is confirmed with ./deviceQuery.
I checkedin the code located in projects. The matric size by default
was 256x4096. I tried to reduce it to 256x256 it failed and also for several lower values.
It indeed failed even for 10x10.

In the compilation of sdk with (make emu=1), ./transpose runs fine withour any problem.

Maybe 256x256 is quite to the limit of your Global Memory.

Are the lower values multiple of the block size? 128x128, 64x64 32x32. Some examples of the SDK are prepared with fixed values and could generate unexpected results when the changes don’t fit some requirement.

Do not give up! ;)


Lower values that I tried are 128x128, 32x256, 32x128, 64x64, 64x128, 32,64. From the example code there is

no clue on any requirement. In each case the error is same.

cudaSafeCall() Runtime API error in file <>, line 113 : out of memory.

I’ll keep trying. Thanks