VPI dense optical flow not performant when in parallel with streaming output

liellplane · April 26, 2023, 11:36am

Hello,

I am using CUDA backend VPI functions in a main thread which culminates in sending the image to a server using opencv video writer with gstreamer


def gstreamer_out():
    # leaky downstream throws away old images - default queue is 5
    # sync = false might be useful 
    # not tested with real cameras
    #MUX playback ID https://stream.mux.com/vL9SJU61FSv8sSQR01F6ajKI702WeK2pXRuLVtw25zquo.m3u8

    return (
        "appsrc ! "
        "videoconvert ! "
        "video/x-raw, framerate=(fraction)25/1, format=RGBA ! "
        "nvvidconv ! "
        "nvv4l2h264enc ! "
        "h264parse ! "
        "flvmux ! "
        "queue leaky=downstream ! "
        "rtmpsink location=rtmp://global-live.mux.com:5222/app/51bc0427-ad29-2909-4979-11ee335d2b53 sync=false"
    )

    out_stream = cv2.VideoWriter(
        filename=gstreamer_out(),
        apiPreference=cv2.CAP_GSTREAMER,
        fourcc=0,
        fps=25.0,
        frameSize=output_size)

in a child thread I am performing dense optical flow on a 1080p image


with time_it("INF: convert image for OF"):
                with streamLeft:
                    curFrame = vpi.asimage(np_img, vpi.Format.BGR8) \
                        .convert(vpi.Format.NV12_ER, backend=vpi.Backend.CUDA) \
                        .convert(vpi.Format.NV12_ER_BL, backend=vpi.Backend.VIC)
            
            if prevFrame is not None:
                with time_it("INF: optical flow"):
                # Calculate the motion vectors from previous to current frame
                    with vpi.Backend.NVENC:
                        with streamLeft:
                            motion_vectors = vpi.optflow_dense(prevFrame, curFrame, quality = vpi.OptFlowQuality.LOW)

My issue is that this drops the output FPS of the main thread from ~30fps to about ~12fps

I understand the codec nvv4l2h264enc is using the NVENC chip, as is the Optical Flow - can the chip not be used in parallel? I am using max power settings. What are the options for CPU video encoding instead?

many thanks

AastaLLL · April 27, 2023, 2:38am

Hi,

We want to check the NVENC behavior further.
Would you mind sharing the complete source with us?

Thanks.

liellplane · April 27, 2023, 8:47am

sure

VC_Detect_For_Nvidia.py (9.3 KB)

AastaLLL · April 28, 2023, 2:41am

Hi,

Thanks for the sample.
The instructions within the sample are detailed.
We will give it a check and share more information with you later.

Thanks.

liellplane · April 28, 2023, 8:49am

Thanks AastaLL, I can provide the link to view the RTMP video if necessary

Also we are not married to the opencv/gstreamer components, as long as we can send 30/40fps 1080p to an rtmp/rtmps endpoint thats all we need so other solutions are very welcome - including CPU encoding at last resort

AastaLLL · May 2, 2023, 5:35am

Hi,

Suppose you should be able to reproduce a similar behavior with a video input/output.
If so, could you modify the sample to use video? It will be easier for our internal team to check.

Thank.s

AastaLLL · May 3, 2023, 3:28am

Hi,

We have checked the sample shared in Apr 27.
However, the sample doesn’t call dense optical flow.

Could you double check the file?

Thanks.

liellplane · May 3, 2023, 8:43am

Hi AastaLLL, oops that was the wrong file! Here is the correct one. The behaviour is collapse of the output FPS from~40fps no optical flow down to ~18fps with optical flow. I have not had an opportunity to convert it to a video output as not clear on how that is done yet

VC_OF_Detect_For_Nvidia.py (8.4 KB)

AastaLLL · May 8, 2023, 7:26am

Hi,

Could you share some info about which elapsed time we should focus on?
We test the script with optical on and off. The performance looks vary across the frames:

Optical Flow OFF

VC: upload to GPU (2): 0.301ms
VC: perp processing & sync (2): 0.541ms
VC: output GPU to CPU (1): 1.399ms
VC: put image on queue (2): 0.172ms
INF: get object off queue: 24.199ms
INF: convert image for OF: 5.673ms
VC: draw on rectangles: 6.136ms
INF: get object off queue: 0.148ms
VC: output to mux: 33.566ms
------
VC: upload to GPU (2): 0.354ms
VC: perp processing & sync (2): 0.590ms
INF: convert image for OF: 37.251ms
VC: output GPU to CPU (1): 2.402ms
VC: put image on queue (2): 0.089ms
INF: get object off queue: 23.352ms
VC: draw on rectangles: 0.409ms
INF: convert image for OF: 2.546ms
INF: get object off queue: 0.059ms
INF: convert image for OF: 9.984ms
VC: output to mux: 12.066ms

Optical Flow ON

VC: upload to GPU (2): 0.828ms
VC: perp processing & sync (2): 5.917ms
VC: output GPU to CPU (1): 3.388ms
VC: draw on rectangles: 0.369ms
INF: optical flow: 63.797ms
INF: get object off queue: 0.056ms
INF: convert image for OF: 0.884ms
VC: output to mux: 46.051ms
------
VC: upload to GPU (2): 0.222ms
VC: perp processing & sync (2): 0.546ms
VC: output GPU to CPU (1): 1.207ms
VC: draw on rectangles: 0.359ms
VC: output to mux: 12.077ms

liellplane · May 9, 2023, 8:50am

Hi AastaLLL and thats great you tried some tests

Yes “output to mux” is the opencv video writer with gstreamer, which far as I know using the NVENC chip encoding the frames

With OF off that output stays below10ms, and get an output FPS on Mux (our video endpoint service) of ~42. With OF on, we hit maximums of 42ms+ every few frames and an FPS on Mux of 20.

AastaLLL · May 10, 2023, 3:28am

Hi,

We turn off optical flow by marking the below condition:

if 0:#prevFrame is not None:

So the CUDA/VIC conversion for OF still remains.

In such cases, we still can see the occasional latency when writing the image to mux.
So the cause might not be the NVENC but the extra loading of OF.

Thanks.

liellplane · May 10, 2023, 8:25am

Hi AastaLLL, this is well spotted - any suggestions how to speed the conversion or is this the optimal setting?

AastaLLL · May 18, 2023, 6:12am

Hi,

Sorry for the late update.

It seems there are several CPU ↔ GPU buffer transfers in your pipeline.
Maybe you can try our jetson-utils which can read the camera to a GPU buffer to see if it helps.

github.com

dusty-nv/jetson-inference/blob/master/docs/aux-streaming.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="../README.md#appendix">Back</a> | <a href="aux-image.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Appendix</sup></p>  

# Camera Streaming and Multimedia

This project supports streaming video feeds and images via a variety of interfaces and protocols, including:

* [MIPI CSI cameras](#mipi-csi-cameras)
* [V4L2 cameras](#v4l2-cameras)
* [WebRTC](#webrtc)
* [RTP](#rtp) / [RTSP](#rtsp) 
* [Videos](#video-files) & [Images](#image-files)
* [Image sequences](#image-files)
* [OpenGL windows](#output-streams)

Streams are identified via a resource URI and accessed through the [`videoSource`](#source-code) and [`videoOutput`](#source-code) APIs.  The tables below show the supported input/output protocols and example URIs for each type of stream:

### Input Streams

This file has been truncated. show original

Thank.s

liellplane · May 23, 2023, 7:43am

Hi AastaLLL

I didn’t realise there are several CPU-GPU transfers, I only thought it happened when I loaded in the test image and when I prepared it to send to our streaming platform

I will try with a real camera and see if that makes a difference!

Many thanks

system · June 14, 2023, 3:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Xavier NX Dense Optical Flow Weird Result Jetson Xavier NX vpi	4	243	July 4, 2024
OpenCV application uneven frame times Jetson Xavier NX opencv , performance , opencl	14	2761	January 19, 2022
Dense Optical Flow error on Xavier NX with large video Jetson Xavier NX vpi	5	90	August 27, 2024
Slow Optical Flow using VPI on Xavier NX Jetson Xavier NX vpi	11	453	March 13, 2024
Vpi performance benchmarking Jetson AGX Xavier vpi	5	776	April 12, 2023
GPU Acceleration Support for OpenCV Gstreamer Pipeline Jetson Xavier NX opencv , gstreamer	17	8004	October 18, 2021
Deepstream nvof performance DeepStream SDK nvbugs , deepstream	14	111	August 16, 2024
Optical flow acceleration Jetson AGX Orin	25	2835	April 27, 2023
Deepstream Pipeline with NVOF and NVOFVISUAL Elements on Jetson Orin Nano DeepStream SDK	18	277	June 17, 2024
Why Jetson vic has a significant performance drop? Jetson Xavier NX vpi	8	35	December 19, 2024

VPI dense optical flow not performant when in parallel with streaming output

Optical Flow OFF

Optical Flow ON

Related topics