Does anyone know whether or not some of the GL interop commands (cuGLMapBufferObject, cuGLUnmapBufferObject, …) cause an implicit cudaSynchronizeThreads()?
This question is very important for the design of my project as I’m going to combine GL interop with CUDA 1.1 asynchonousness.
Not entirely sure, but it would make sense for buffer mapping functions to be blocking - they do not have a stream argument and do affect memory (pointer manipulation). Now, it’s possible that the function blocks only until the pointer is safe to use, not necessarily until all previous calls have completed.
For now try this - time buffer mapping with and without asynchronous cuda calls (kernels, memcopy, whatever). You should be able to find out if buffer map waits for all preceding calls to complete.