I am planning to start work on a CUDA memory allocator host-side library in modern C++, in the coming months.
It will be focused on my research work, into using GPUs in analytic DBMSes, i.e.:
- Most memory taken up by a small number of very large areas
- Areas intended for smaller data can become fragmented for better allocation speed (since they're all released together as a query execution ends)
- Mostly prospective allocation, i.e. "I will need X MB between time sequence points t_1 and t_2" or perhaps even with some kind of linkage to a task execution graph detailing allocation lifetimes.
- Since allocation doesn't scale up in the time it takes as data scales up, performance will be a secondary consideration at first relative to API neatness and features. Of course I would rather make it super-fast to the extent I can.
The chances of me actually going through with this work are, say, 70%. If you are interested in collaborating on this, I would be very glad to do this in collaboration with others, and of course let the focus shift somewhat towards their personal/group interests as long as it’s useful enough for me.