CUDA_OUT_OF_MEMORY despite large amounts of memory available

My application is currently exhibiting a problem at seemingly random occasions where CUDA_OUT_OF_MEMORY is returned by functions that, according to the documentation, don’t return that error code.

The error occurs inconsistently, during around 10% of the runs of a larger application, and at various points of the computation. This makes it difficult to narrow down its source or provide a minimal example. However, I have logged the estimated free memory given by cuMemGetInfo before and after every allocation or free and have gathered the following observations:

The first function to return the error code is different for different runs, such as cuStreamCreate or cuEventRecord, but the error is never returned by any allocations or operations that would expectably require considerable amounts of memory. Mind that cuEventRecord’s documentation, for example, does not even list CUDA_OUT_OF_MEMORY as a potential return code.
When the error occurs, the most recent call to cuMemGetInfo, invoked after the most recent allocation, estimates the free memory of the device to be within the magnitude of a GB, with the lowest value observed so far being 742024807 bytes. A safety margin of this size is intended. Be aware that this should be the only application operating on that GPU at the time of the error and that the GPU is in TCC mode. The logs of cuMemGetInfo’s output seem to match this, as no indications of the free memory estimate changing can be found outside of allocations or frees. On the CPU side, there should also be abundant amounts of RAM available.

This leaves me with the following three questions:
What are possible reasons for CUDA_OUT_OF_MEMORY being returned outside of allocations, when large amounts of memory seem to be available?
How does it happen that cuEventRecord is the first function to return CUDA_OUT_OF_MEMORY, if that function is not intended to return that particular error code?
What options do I have to narrow down the source of the error in my particular case?

Are there any CUDA API calls with unchecked return values? Especially those which can return CUDA_ERROR_OUT_OF_MEMORY according to the documentation?

I have checked and the only API calls whose return code is not checked are cuDriverGetVersion during initialization as well as cuGetErrorName and cuGetErrorString during error handling.

A CUDA out-of-memory error can be returned in situations where the call would require establishment of a device context on a device that you are “not using” or that you have not yet established a device context on, and some other user/process is using that device, and probably has used up quite a bit of memory on that device.

This particular issue can only occur in a multi-GPU scenario, and can only occur when there is more than one user or process, using GPUs.

An example/write-up is here.

If I understand you correctly, the out-of-memory error stems not from the device I am observing and have allocated resources on, but from another GPU that is actually out-of-memory and my process would implicitly try and fail to create a context on said other GPU.

In the scenario you linked, this occurred due to the call to cudaMemcpy synchronizing with both devices and therefore needing a context on each device. The implicit context creation runs into the out-of-memory error.

Is that a plausible scenario to occur during a call to cuMemGetInfo, which is not synchronizing and using the driver API, which to my understanding does not implicitly create contexts?

I don’t know and wouldn’t know without writing a test case, along the lines of the test case I linked. Your previous description suggested the error happened at:

I don’t know what is happening exactly, in your case.