If that’s uncommented, it could be part of the issue. You might try resizing on the ISP before you map it for the cpu and create a CV mat. That’s what I’ve done to maintain 60fps while still doing some trivial work in OpenCV. Also, you can reuse these buffers to avoid repeated reallocation. If you’re just looking to display some video, probably OpenCV is not going to be very performant no matter what you do.
I can’t paste what I’ve done since it’s proprietary, but I can tell you I used:
NvBufferCreateEx (to create a scratch buffer)
NvBufferGetParamsEx (to get some info from that buffer)
NvBufferMemMap (to map the scratch buffer)
ExtractFdFromNvBuffer (to get an fd from the in buffer)
NvBufferGetParams (to get parameters from the in fd)
NvBufferTransform (to convert and transform the buffer from in to the scratch)
And only at the end more or less what you did to create a low-res, greyscale, cv::Mat which is exactly what we needed for blob detection. Basically, the only way we were able to make it fast way by doing everything possible on the ISP before using OpenCV as the last step. We were also able to do all the processing in a separate worker thread so the playback wasn’t blocked (the result of the OpenCV calculation did not have to be synchronized with playback).
Forewarning: the documentation is pretty good but the examples are so-so and the interface is not nearly as nice as OpenCV. Less pure C and more C++ would be nice. You’ll likely have to read and experiment a lot before it works the way you want but once you get it working, I don’t think you can beat the performance. In our case, the preprocessing was basically “free”.