Custom Memory allocator for Cuda desired

HannesF99 · September 18, 2015, 7:50am

I noticed that in my application I have a big overhead from the allocation/de-allocation routines (cudaMallocPitch, cudaFree). I need a lot of temporary images (pitch-linear memory), etc…

Will get worse in the future i suppose, because execution time for my kernels will go down (faster GPU), but the time for allocation/de-allocation will stay constant.

I am wondering if there is some nice open-source Custom memory allocator, holding a memory pool or something like that, for Cuda. If possible, especially steered towards allocation/de-allocation of images (which can be tens of megabyte big).

I know there is a custom memory allocator in the ‘Cub’ library (GitHub - NVlabs/cub: THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.).

Is there some other useful allocator available for CUDA ? It could be also a allocator for CPU memory, if it could be easily modified (replace the CPU allocation/free routines by GPU allocation/free routines).

njuffa · September 18, 2015, 3:10pm

I assume you have already examined the possibility of re-using existing allocations to avoid malloc/free cycles? A faster CPU might also help since allocations involve mostly administrative overhead on the host. I have never measured the speed of CUDA allocations relative to CPU speed, though, not sure how much impact that has. My general advice is to pair fast GPUs with the highest single-thread performance CPUs to avoid becoming bottle-necked on serial tasks (at the moment this means CPUs with >= 3.5 GHz).

Custom memory allocators are exactly that, they are customized for each use case. This makes it unlikely that there is code out there that does exactly what is right for your application. I have in the past written simple sub-allocators or memory pool implementations on CPUs in about one work day, so “rolling your own” seems like a possibility.

HannesF99 · December 14, 2015, 11:06am

update: The section 4.3 of the baidu paper (http://arxiv.org/pdf/1512.02595v1.pdf) gives hints on that topic (when dealing with ‘large’ GPU memory allocations).
A simple (CPU) implementaiton of the ‘buddy’ memory allocator can be found at GitHub - cloudwu/buddy: Buddy memory allocation or "Buddy Memory Allocation" Question, and related slides by a NVIDIA guy are at http://iwcse.phys.ntu.edu.tw/parallel/Oct17/Jon-Yu_Lee_131017.pptx
And there is a nice paper from HPG14 http://www.fi.muni.cz/~xvinkl/articles/hpg2014.pdf , slides at http://www.highperformancegraphics.org/2014/wp-content/uploads/sites/3/2014/07/Vinkler-Allocator.pdf and code (BSD license) of their ‘CMAlloc’ allocator can be found at http://decibel.fi.muni.cz/~xvinkl/CMalloc/
ScatterAlloc (GitHub - ComputationalRadiationPhysics/scatteralloc: ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU) and the newer mallocMC (GitHub - alpaka-group/mallocMC: mallocMC: Memory Allocator for Many Core Architectures) is open source with a MIT license, but seems to be biased towards small and repetttive allocations (according to the HPG14 paper).
Think I will use the ‘CMalloc’ allocator from the HPG14 paper.

Topic		Replies	Views
cudaMallocManaged allocating more memory than requested CUDA Programming and Performance	7	3135	July 13, 2018
Why is cudaMallocHost() so slow? CUDA Programming and Performance	7	8771	November 17, 2021
Memory allocation by GPU -> Possible in future releases? CUDA Programming and Performance	5	3765	March 13, 2009
Consistent Memory Allocation CUDA Programming and Performance	1	544	January 19, 2016
How to determine the base adress alignment and pitch alignment used by 'cudaMallocPitch' ? CUDA Programming and Performance	4	2461	June 9, 2016
Memory fragmentation CUDA Programming and Performance	5	6678	October 13, 2009
Asynchronous cudaMalloc CUDA Programming and Performance	3	11293	July 2, 2012
On-card memory allocation CUDA Programming and Performance	5	3267	October 18, 2007
Is it possible to use pinned memory? Outside of CUDA CUDA Programming and Performance	7	6171	February 14, 2008
fine control of memory pinning in CUDA CUDA Programming and Performance	12	16502	May 1, 2008

Custom Memory allocator for Cuda desired

Related topics