Can we force all allocations from cudaMalloc to be a specific virtual address space range? Is there a way we can pre-allocate any calls to CudaMalloc inside the kernel to only allocate in this pre-allocated buffer?
To my knowledge: no and no. That is hardly surprising, malloc
-type allocators are in general not designed for this level of control.
It is no clear what you are trying to accomplish, but generally speaking, if an application needs very specific behavior from an allocator, the standard approach is to allocate one contiguous (in virtual address space) chunk of memory from the standard allocator (e.g. malloc
or cudaMalloc
) at application startup, then use an application-specific sub-allocator that implements whatever properties are desired for that chunk of memory.
From practical experience (I have done this multiple times for different types of apps), programming and testing a simple sub-allocator based on a free list takes a couple of hours. You could also look into creating a memory pool, a slab allocator, or a buffer ring, depending on what you are trying to accomplish.
CUDA provides a mechanism to manage virtual locations of device memory allocations, but to my knowledge this has no bearing on in-kernel usage of malloc
, new
, or cudaMalloc
.
If you needed to do this inside the kernel, I don’t know of any solutions other than roll-your-own allocator (i.e. what njuffa described)
Thank you.
That makes sense, I was thinking of using cuMemMap + address reserve to create a big chunk and to force allocation at a specific virtual address range. How about the code section/data section in the binary? Can we control where that is mapped to? I want to force the code section to be at a specfic virtual address.
I’m not aware of any method to do that.
In that case, is there a way to figure out where the code/stack is mapped to?
I’m not aware of any method to do that.
(You might be able to do something hacky like get a pointer to a device function in device code, then make some sort of guesses based on that. However from my perspective this topic never comes up in CUDA programming that I am familiar with, so I don’t know the intent here.)
I just want to get a complete view of the memory allocations and where they are mapped to. I was able to find all UVM ranges that were allocated by looking at the open source GPU device driver. But I am unsure where the kernel is loaded to and whether its exposed with UVM or not.
For what purpose? At this point this looks like an XY problem. You might get some relevant tips if you mention what it is you are actually trying to accomplish.
There is really no purpose (i guess for my own curiosity). I guess my end goal is to make a tool to visualize memory by making a memory map where it shows where things are allocated in GPU memory based on its virtual address.