Memory allocation reliablity

Hi hopefully someone can explain my problem.

We allocate two large blocks of memory, use a kernel to process the data and then free the memory. This cycle is repeated continously for hours at a time and we have been getting after varying numbers of iterations ‘unable to allocate memory’ error. After much testing we have simplified the code to the attached memAlloc.cpp which just allocates and then frees blocks of memory with no GPU kernel processing.

The pseudo code for the attached file is as follows

Set block size to 140 Mbytes
// allocate a large block of memory to ensure incresing block size is possible
Allocate a single 420 Mbyte block of gpu memory
Free the 420 Mbyte block of gpu memory

loop a 1000 times or stop on error
Allocate a block of memory - block1
Allocate a second block of memory - block2
Free block 1
Free block 2
increase the size of the block by 320 bytes

The memory allocation will fail between 500 and 800 iterations

The same program without changing the size of the blocks allocated will run many minutes ( forever ? ). A single block, fixed or changing size will run for many minutes ( forever? ).

GPUs Tested 8800GTX 9800GTX GTX260
Driver 177.35
Windows XP Professional Service Pack 2 ( 32bit )
Intel Quad core PC
AMD Dual core PCs
memAlloc.cpp (2.2 KB)

heap fragmentation!

I’ve tested your code and works without problem on my PC:

  • 8800 GTX
  • CUDA 2.0b2
  • Driver 177.83

Thanks for testing.

I will get and try with driver 177.83. I already use CUDA 2.0b2

I have tested your application. I ran into the same problem. The application output as following:

Successful allocate 410156 KBytes memory

Successful free memory

Start loop allocating 2 memory blocks of 136718 KBytes and then free the blocks

The size of the blocks is increased by 320 for each iteration

loop count 100 size 136750 KBytes

loop count 200 size 136781 KBytes

loop count 300 size 136812 KBytes

loop count 400 size 136843 KBytes

loop count 500 size 136875 KBytes

loop count 600 size 136906 KBytes

Failed on allocating GPU memory gpudata1. cudaError message: unknown error

Exception error. Iteration 606

Press ENTER to exit...

GTX260

CUDA 2.0Beta2

VS2005

Windows XP Professional SP2

Xeon E5410

Memory 4GB

The applications segfaults for me:

gdb ../../bin/linux/release/seCudaMemSmoke

GNU gdb 6.6-debian

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu"...

Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) run

Starting program: /afs/kip.uni-heidelberg.de/user/mbach2/gpu-dev00/NVIDIA/cudaSDK/bin/linux/release/seCudaMemSmoke

[Thread debugging using libthread_db enabled]

[New Thread 47798758420832 (LWP 28537)]

Successful allocate 410156 KBytes memory

Successful free memory

Start loop allocating 2 memory blocks of 136718 KBytes and then free the blocks

The size of the blocks is increased by 320 for each iteration

loop count 100 size 136750 KBytes

loop count 200 size 136781 KBytes

loop count 300 size 136812 KBytes

loop count 400 size 136843 KBytes

loop count 500 size 136875 KBytes

loop count 600 size 136906 KBytes

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 47798758420832 (LWP 28537)]

0x00002b790407d483 in ?? () from /usr/lib/libcuda.so

(gdb) backtrace

#0  0x00002b790407d483 in ?? () from /usr/lib/libcuda.so

#1  0x00002b7904070a23 in ?? () from /usr/lib/libcuda.so

#2  0x00002b79040650d9 in ?? () from /usr/lib/libcuda.so

#3  0x00002b7902ce370b in cudaMalloc () from /usr/local/cuda/lib/libcudart.so.2

#4  0x0000000000400c91 in main ()

(gdb)  

Thanks for those who tested this problem.

I have now installed driver 177.83 and the test loop will now run for 10000+ iterations without failure.

GTX 260 8800GTX CUDA 2.0b2 Driver 177.83

IMHO, when you have frequent allocations/deallocations, it’s better write your own allocator instead of using cudaMalloc/cudaFree. I wrote my own allocator to avoid fragmentation, and also speed up allocations/deallocations by large factor.

Thanks we have implemented are own memory management and all is well :-)