Thread-safe asynchronous OpenCV CUDA Stream operations on live video

I’m currently working on developing some preliminary code utilizing OpenCV’s CUDA libraries for an application that will someday involve SLAM, or at least some form of navigation and/or mapping in GPS denied areas. I’ve been spending the last couple of days getting familiar with OpenCV’s CUDA functionality, and began work on live video today. At present, this video is streamed over TCP using GStreamer from a Raspberry Pi module running on a StereoPi V1 carrier board. This video is piped to a Jetson Nano, which handles the receiving end of the pipeline in a C++ OpenCV file. All of this works perfectly fine, but in efforts to reduce latency with operations like depth mapping and keypoint detection, computation, and matching, I wanted to look into asynchronous threading and generally have a better idea of how to optimally use GPU and CPU memory and resources. I’ve been digging around on forums etc for quite some time, but there seems to be very little documented use of OpenCV’s CUDA Streams. Any direction on where to look for examples or better documentation would be much appreciated.

By default we don’t enable OpenCV CUDA. This would need other users to share experiecne.

And would like to suggest you try VPI. It can provide better performance on Jetson platforms. Please check
Trying to get OpenCV (built with CUDA) working with FFMPEG - #6 by DaneLLL

Hi DaneLLL, thanks for the tip regarding VPI. I’m personally already somewhat familiar with OpenCV and would prefer to remain on that system, but I will look into VPI. In general, I’d still love to know where I can look for general information on CUDA Streams/threading if there’s an official resource for it. It seems like not many people have any experience with OpenCV’s CUDA Streams implementation, but on the off chance that another user has some examples or insight with a simple use case it would be great.