Hello,
I’m writing an application which uses CUDA/OpenGL interoperability.
I understand that cuGraphicsResourceGetMappedPointer, cuGraphicsResourceGetMappedMipmappedArray and cuGraphicsSubResourceGetMappedArray mentions that a mapped pointer or a mapped (mipmapped-)array may change every time the resource is mapped.
e.g. cuGraphicsResourceGetMappedPointer:
The value set in pPointer may change every time that resource is mapped.
Is this true even when I don’t change buffer size or array dimensions between map/unmaps?
I’m not surprised if a mapped pointer changes when I reallocate/resize the original buffer, but in the case where I do neither of reallocation nor resize(, just update the buffer contents), it looks useful if there is a guarantee that the mapped pointer doesn’t change.
Next question is:
We need to create a surface object and a texture object every frame for the interop-ed resource when we want to read/write a texture every frame for CUarray since CUarray itself changes every frame. Is this correct? (I have CUDA kernel dispatches and OpenGL calls every frame)
If this is correct, I’m curious about performance of such a object creation. Is it healthy to create a surface object/texture object every frame? Does it have some implicit stream synchronization?
If I wanted to make a texture available for both reading and writing on the CUDA side, I would just use a surface.
If I wanted to do this frame-by-frame, I would probably set up two surfaces and ping-pong between them.
Regardless of mechanism, the OpenGL<->CUDA handoff involves synchronization.
I don’t have technical details of surface creation (or texture object creation), beyond what is published. It may very well have an effect like cudaMalloc.
Surface object and texture object creation requires an array.
Using CUDA after OpenGL calls requires mapping the resource.
Mapping the resource always produces a different array.
Therefore, it seems impossible to create only two surface objects for ping-pong before the rendering loop if I understand correctly.
It is not a problem that mapping the resource has GPU-GPU synchronization (i.e. OpenGL shaders and CUDA dispatches). My concern is whether calling the mapping function on the host side will have CPU-GPU synchronization like waiting all the CUDA dispatches already issued regardless of CUstream passed.
Yes, you need to create the surface object each time you map the resource (acquire the CUDA array) from the OpenGL side. That is part of the hand-off from OpenGL to CUDA. That has synchronization.
I’m not aware, for example, that calling cudaCreateSurfaceObject has its own synchronizing effect (like cudaDeviceSynchronize()) but it may.
What would seem logical to me is that the mapping function synchronizes on OpenGL pending work, and the unmapping function synchronizes on CUDA pending work, but I haven’t checked that out. You might be able to use a methodology similar to above to answer that.