I’m playing with the low level memory management and I’ve come across an issue to my use case. The virtual address (VA) reservation of cuMemAddressReserve do not seem to be consistent or replicatable. Docs are here: CUDA Driver API :: CUDA Toolkit Documentation
Here’s the test I am doing: I launch a process that allocates 5 buffers of 16MB physical memory, requesting a reservation where addr is 0, which means we don’t need any specific VA range, and writing the resulting 5 address to a file.
After this process finishes, I launch another one that reads the 5 pointers from the file and tries to reserve the exact same VAs, passing them as the ptr argument to cuMemAddressReserve. The issue is that I was not able to do this because the VAs returned are not predictable. Sometimes they are close and sometimes they are wildly different.
For example, this is the output of this program, where the first 5 pointers are the ones we wanted, and the following 5 are what is returned by cuMemAddressReserve.
read from file: 7fc400000000 read from file: 7fc401000000 read from file: 7fc402000000 read from file: 7fc403000000 read from file: 7fc404000000 7fc400000000 7fc402000000 7fc403000000 7fc404000000 7fc405000000
This case the VAs are really close, but for some reason a 16MB block is ignored by the CUDA RT and I can’t come up with an explanation why.
If I run it again I can see something like this:
read from file: 7fc400000000 read from file: 7fc401000000 read from file: 7fc402000000 read from file: 7fc403000000 read from file: 7fc404000000 7fdbf2800000 7fdbf6600000 7fc402000000 7fc404000000 7fc405000000
Now the first two pointers are completely different. It seems that even though the process finished and I launch it again, there is some residual state on the GPU. Is this true?
Is cuMemAddressReserve supposed to be non-deterministic? Is there a way to make it deterministic?