I ran into a very strange issue. cudaMallocPitch(&ptr, &pitch, 40000, 5000) suddenly started returning result code 999. I checked nvidia-smi and the memory usage on the GPU was low, about 10MB/8GB. I then restarted my machine and cudaMallocPitch started returning 0 again, as expected. The GPU is a 3070 Ti. What might have happened?
Gremlins? Kobolds? Leprechauns?
All joking aside, there is too little information to speculate intelligently. A common scenario behind “allocation failed although requested size is less than total available memory” is fragmentation of the free memory, but given the extremely low memory usage in this instance, it seems highly unlikely.
Error code 999 is cudaErrorUnknown. My guess is therefore that this could be caused by some sort of hibernation (sleep mode) where the previous state of the CUDA driver was not successfully restored in full upon waking. CUDA may detect some inconsistencies in its internal state, causing it to throw this error.
Gotcha. How would I “reset” the GPU so to speak without restarting the machine, to eliminate the possibility of things like hibernation, fragmentation, etc. causing this? I’m writing this code to run on a server, where I want it to be left unattended and able to recover itself to some extent.
In my limited experience, servers do not enter into any kind of sleep mode. The goal of server operators is too keep such expensive hardware well utilized, e.g. never below 25% utilization.
Complications from sleep mode are primarily an issue on laptops, which enter sleep modes to conserve energy. Note that CUDA error 999 being caused by issues when waking from sleep is just a guess on my part.
I don’t have detailed insights into sleep modes. As far as I know, modern systems provide multiple sleep modes of different “depth”. From my experience with Windows 10 (completely independent of any use of CUDA), there can be issues with waking that usually require a system reboot to fix. I have solved this on my own systems by disallowing sleep of any kind (sorry, I do not recall the relevant configuration settings for this). Whether issues with sleep/wake cycling are the fault of the hardware, OS, drivers, apps, or all of the above, I do not know.
Memory fragmentation is typically something that happens when a memory allocator (of any kind, e.g. CUDA, C++ runtime, operating system) runs over an extend period of time and many allocations plus de-allocations are performed. It is the latter part that can convert an allocator’s memory map into Swiss cheese in the long run. It usually does not become an issue unless memory use nears the limit of capacity.
First line of defense is to avoid frequent allocation / de-allocation, minimize the number of different sizes used in allocation requests, and never plan on using more than 80% of total user memory.
If that is not doable, you may want to restart the app when memory allocations start failing. My expectation here is that long running applications (think BOINC, Folding@Home etc) have a checkpoint mechanism that allows for restarts.
If that is not acceptable, you can have the app grab all required memory at the start of the app, then parcel out the memory using your own app-specific allocator. That could be a slab allocator, a memory pool, a buffer ring, really anything you desire and that makes sense for the use case. Custom allocators can be developed in as little as a couple days; I have written three or four over the years.