Porting Over To Copy Engine Texture Processing

scannerman · April 17, 2015, 5:43pm

I have a fairly complex multithreaded film scanning application developed with Visual Studio C++ for Windows. The current OpenGL implementation is as follows:

Performs all OpenGL Initialization, texture and shader creation in application thread.
Initializes Main Viewing Window with rendering context in application thread.
During scan execution, creates a separate worker thread and continuously uploads images, renders to FBO, and downloads finished images with no parallel GPU activity.
Uses the application GL thread to view finished images on the application viewing window.
Uses a separete worker thread to write finished image buffers to disk.

I have studied the NVidia Copy Engine white paper, as well as Ch. 28 and 29 of the OpenGL Insights text book, but am still somewhat confused as to the proper OpenGL thread construction to take advantage of the Quadro dual copy engines. The OpenGL Insights sample code is difficult for me to parse, as it uses a c++ class that encapsulates many of the OpenGL calls.

My initial questions are:

In the NVidia examples, the application thread is used for GL rendering, and shares its rendering context (wglShareLlists) with the upload thread. If my upload, render and download threads are all separate worker threads, does this change the context sharing structure?
Why does the render thread only need to share contexts with the upload thread, and not the download thread (finished frame)?
Regarding the Pixel Buffer Object buffers, I don’t quite understand the purpose of using two sets of buffers for both uploading and downloading. Is the reason to use one for even frames and one for odd frames, or is it use one to load from host memory and the other to copy the data from the from the first PBO to the second during one frame transfer?

If there is any dual copy-engine code out there that uses native OpenGL calls exclusively, I’d appreciate a link to it.

busta78 · April 18, 2015, 12:56am

Nvidia copy-engine from my understanding is nothing more that a glorified DMA controller. PBO as required by the OpenGL allow for asynchronous behavior and as such I’m thinking that any implementation of OpenGL supporting PBO would have some for of DMA facility to be efficient. Also are you working with Quadros ?

Going by what you mentioned, if the app thread is only sharing with the upload thread only, and the download thread does not share directly ( via the app thread ) or indirectly ( via the upload thread ), then the download thread MUST not be making any GL calls that utilize resources used by the other threads. There is no need for sharing context unless resources needs to be shared( used ) between both.
Ping-ponging buffers is a common practice to prevent GL stalling on resource usage/modification. Remember when you submit a GL call its synchronous to the user but it call may not get executed on the device until several frames later. If you are uploading data to PBO p in frame n and then go to do another upload to p on frame n+1, PBO p may be in use in that frame. To ensure coherency, the driver may have to make a copy of the resource or even worse, pause all operation until its finished using the resource. That was a over-generalization of what happen, but you get the point that having several buffers in flight will minimize that particular case.

scannerman · April 18, 2015, 4:23am

Thanks to busta78.

I use ten host buffers for both uploading to GL and downloading from GL. Would it be advisable to create 10 pbo/textures pairs for uploading, and 10 pbo’s and FBO render textures for downloading?

The example I described was the Nvidia example code. My download PBO’s will access a texture attached to an FBO that the render thread writes to. So do I need to share the FBO attached texture with the download thread?

Topic		Replies	Views
Processing Windows GDI drawing with CUDA Finding the way to do so at high speed CUDA Programming and Performance	3	6066	September 11, 2009
Copy Engine OpenGL Questions OpenGL	0	608	June 27, 2016
display a buffer openGL/cuda question CUDA Programming and Performance	11	8161	May 13, 2008
Using GL buffers from a second render thread OptiX	6	1233	June 14, 2022
Has anyone used 2 cards for CUDA rendering? CUDA Programming and Performance	6	3309	May 18, 2010
Different threads in runtime api CUDA Programming and Performance	8	6707	September 4, 2008
CUDA Multi-GPU with OpenGL interop CUDA Programming and Performance	8	13011	December 13, 2010
Buffer performance warning GL_PIXEL_UNPACK_BUFFER_ARB when uploading to depth texture OpenGL	1	20	March 27, 2025
Error executing two threads using OpenGL CUDA Programming and Performance	3	1386	November 20, 2008
Integrating CUDA with existing OpenGL apps CUDA Programming and Performance	1	3783	September 25, 2007

Porting Over To Copy Engine Texture Processing

Related topics