I want to wrap the device-side malloc() function with a function of the same name to enable additional functionality (like logging) without changing the calling code. However, I have no idea how to reach the original malloc.
There is currently no support for dynamic linking on the GPU, and as a result, the way device side libc functions (malloc(), free(), printf()) are linked is different from how its implemented on the CPU. The GPU linking mechanism does not have any way for the user to intercept the call without changing the call site.
What kind of information are you trying to capture using this interception layer ? If you are trying to debug something in the device side malloc() calls, have you tried using cuda-memcheck ?
Hi vyas!
Unfortunately, debugging/cuda-memcheck is not what this is about. I’m trying to build an abstraction for malloc, so that it becomes possible to switch the underlying memory allocator to a different implementation (e.g. a pool-based allocator). At the same time, there should be a way to use the original malloc() internally.
Thanks for your code! This approach works well, but I had hoped to call my wrapper-function “malloc”, so that the calling code doesn’t need to be changed. Also, as I understood it from my testing, overwriting malloc() results in automagically overwriting operator new() and operator new as well. Choosing a new name like “myMalloc” will make it impossible for the user of this libray to use new in order to get memory from this allocator.
Well, I guess there is not really a more elegant way for now.