I noticed that in my application I have a big overhead from the allocation/de-allocation routines (cudaMallocPitch, cudaFree). I need a lot of temporary images (pitch-linear memory), etc…
Will get worse in the future i suppose, because execution time for my kernels will go down (faster GPU), but the time for allocation/de-allocation will stay constant.
I am wondering if there is some nice open-source Custom memory allocator, holding a memory pool or something like that, for Cuda. If possible, especially steered towards allocation/de-allocation of images (which can be tens of megabyte big).
I know there is a custom memory allocator in the ‘Cub’ library (https://github.com/NVlabs/cub).
Is there some other useful allocator available for CUDA ? It could be also a allocator for CPU memory, if it could be easily modified (replace the CPU allocation/free routines by GPU allocation/free routines).