Hi, I am running a simple VPI pipeline that has the following operations: image rescale, perspective warp, and finally convert image format. I want to have these operations run in real-time on a camera stream, and encode the output to H264 to send it out over the network.
I’ve attached a bare version of the code that I use for this pipeline. I use libArgus to capture the camera frames, then wrap the buffer into a VPIImage, and perform the operations, where I use VPIEvents to do some timing.
I am not getting the performance I expected, for example: The perspective warp operation typically takes >3ms, whereas based on the performance benchmarks listed here VPI - Vision Programming Interface: Perspective Warp I would be expecting <1ms (Jetson Nano, CUDA backend, image: 1920x1080 / NV12ER, linear interpolation).
Also the image format conversion operation seems to take too long (4-5ms instead of 1-2ms).
I am not sure where this problem comes from, could it be that the VPIEvents lead to some overhead? Interesting to note is that when I do the same operations in the benchmarking code from here: VPI - Vision Programming Interface: Benchmarking the timings do correspond to those given in the performance tables in the documentation. The only clear difference I see that some tests are batched on timings are averaged there, such that less events are recorded.
In addition, I would like to pipe the output of these operation to a h264 encoder, preferably through gStreamer. What is the best way to do this as efficiently as possible? I have tried copying that data into a cv::Mat (as in the attached code, except I use imshow for debugging there), and then feeding it to a gStreamer pipeline but this is slow. I suspect because of CPU<->GPU memory copies.
main.cpp (9.5 KB)