I need to basically do some image processing on a sequence of input images that will be produced at a certain rate, say 30 fps. However, they do not necessarily need to be processed in the order generated.
My thought was to just queue up the work, all asynchronously:
For each image as it arrives:
- Upload to GPU (non blocking on CPU)
- Do work (wait until upload event signaled, but non blocking on CPU)
- Download results to CPU (wait until work done signaled, but non blocking on CPU)
Then periodically poll to see which downloads are complete. Is this the right approach?
Also, I’m thinking memory transfers (cpu to gpu and gpu to cpu) can be done in parallel to other work? In other words, while the GPU is image processing one image, it could start uploading the next image, and start downloading to CPU the previous results?