OpenCV Tegra optimizations needed 60 FPS

I have a raw video stream to the TX2 from a USB 3.0 camera interface which is then processed using OpenCV.

The image processing operations include the following:

RGB to GRAY/ color conversions

For getting 60fps the available time for processing and displaying it is 16.66 ms. Currently using CUDA libraries in OpenCV the time taken is around 100ms. I would like to understand the feasibility of improving this.

It sounds like you’re spending too much time shuffling data back and forth between CPU and GPU. And also perhaps not getting enough pipelining (overlap, parallelism) going.
Without seeing the code or knowing more, it’s impossible to know where this is happening.

How do you capture the image? How big is it? What format?

For reasonable-sized images (say, 2kx1k or less,) you should be able to do all of that on the GPU in a few milliseconds, assuming you write and chain CUDA kernels (or plain OpenGL framebuffer objects) without having the CPU involved. Also, assuming you do enough to queue the next work before fetching the results of the previous work, to get good pipelining.


OpenCV4Tegra use FFmpeg CPU-based decoder.
Please build OpenCV with GStreamer and v4l2 options to enable hardware decoder.