CUDA/OpenGL interop memory coherence

Hello everyone,

I am planning to implement an medical image visualization application in which the algorithms and display both take place in GPU with help of the CUDA and OpenGL interop mechanism.

By consulting the marching cube example in the CUDA sdk, I noticed that there are two parts of cuda allocated memory:

  1. vbo memory that is map/unmapped by the interop library to exchange data between calculation and display.

  2. Pure cudaMalloc allocated memory for the storage of original volume data.

My doubt is that while the vbo memory is guaranteed to be coherent by explicit call of the interop APIs. How do we guarantee that the pure cuda allocated memory remain uncontaminated in the OpenGL context.

The example does not use any visible mechanism and just reuse the cudaMalloc memory every time returning to CUDA context. What will happen when we are running short of graphics memory during OpenGL display? Will the CUDA data be swapped out to the hard disk? How the performance would be affected?

Thank you for your response!


Is this on windows or linux?

In windows WDDM, it is possible for CUDA memory to be swapped out to system memory (not disk, I don’t think). This is because all device memory management happens through WDDM, and WDDM is allowed to do that. However, once you touch that memory in CUDA code, the driver/WDDM will bring it back in.

It’s impossible to comment on performance impact. Obviously there will be some impact.

On linux, the above is not possible. If you perform an cudaMalloc operation in CUDA, that will carve out a piece of GPU memory that will not move until you perform cudaFree, or application termination.

With respect to contamination/corruption, this should be impossible. Memory allocated via cudaMalloc is completely invisible to OpenGL. Any such contamination/corruption would be a bug of some sort.

Dear txbob,

Thank you very much! This clarifies our assumptions.

Yes, we use Windows WDDM driver model. Basing on your explanation of WDDM driver, can I assume that the reverse is also valid, in which WDDM can possibly swap to system memory OpenGL data in order to accommodate cudaMalloc memory?

What is the theoretical maximum memory that cudaMalloc can allocate on a 2GB graphic card, taking in consideration the memory consumed by cuda kernel and cuda runtime context?



Sorry, I don’t know much about OpenGL. However, since the memory allocated via cudaMalloc can be “swapped out” to system memory when WDDM decides its necessary, and since that memory absolutely will be swapped back onto the GPU when a GPU kernel touches it, it stands to reason that it might cause something else to be swapped out.

There is no theoretical maximum allocation. There are CUDA runtime API functions will report the amount of free memory as well as total memory. At any point in time, the amount of free memory reported by the CUDA runtime API is the upper bound on what you can allocate at that moment via cudaMalloc. You will not be able to allocate that amount, but something “close” to it.

Dear txbob,


Thank you very much!