Gstreamer high CPU usage and tearing

I am currently using Opencv with gstreamer to display a video feed from ethernet cameras through a managed switch with significant tearing issues which I believe are down to high CPU usage.

My python script is as follows:

import numpy as np
import cv2

pipe = "udpsrc multicast-iface=eth0 address=239.192.1.32 port=5004 ! application/x-rtp, media=(string)video, clockrate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8, width=(string)640, height=(string)480 ! rtpvrawdepay ! videoconvert ! appsink"

pipe0 = "udpsrc multicast-iface=eth0 address=239.192.1.150 port=5004 ! application/x-rtp, media=(string)video, clockrate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8, width=(string)1024, height=(string)768 ! queue ! rtpvrawdepay ! videoconvert ! appsink"

pipe1 = "udpsrc multicast-iface=eth0 address=239.192.7.150 port=5004 ! application/x-rtp, media=(string)video, clockrate=(int)90000, encoding-name=(string)RAW, sampling=(string)YCbCr-4:2:2, depth=(string)8, width=(string)1024, height=(string)768 ! queue ! rtpvrawdepay ! videoconvert ! appsink"


cap = cv2.VideoCapture(pipe1)
cap0 = cv2.VideoCapture(pipe0)

while(True):
    # Capture frame-by-frame
    ret, frame = cap.read()
    ret, frame0 = cap0.read()

    # Display the resulting frame
    cv2.imshow('frame',frame)
    cv2.imshow('frame0',frame0)
    
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

While running this script the CPU usage on the first three cores is upwards of 95% with the other three being upwards of 60%. Limiting the script to manage one video feed reduces the tearing and load on the CPU though still has slight issues.

This is the result of displaying two cameras within opencv.

I have created a pcap file with two incoming udp streams that when played back has no issues implying the network is not an issue.

My TX2 is set to nvmodel 0 with all 6 cores running 2ghz.

performing this task on a laptop with an i5 processor performs significantly better.

Any suggestions on how to improve performance ?

Who is multicasting the data to you?

Sending raw data at video frame rates is a lot of data to shuffle through the network stacks. I wouldn’t be surprised if there’s several levels of interrupts and memory copies involved; this is likely not a highly-optimized path, especially compared to using CSI cameras (direct attached) using DMA to talk to the GPU hardware, which is more what the typical Jetson use case seems to be.

If you run a system profiler, where do you see the system spend most of its CPU time when the load is this high?
You can use a generic linux tool such as oprofile, or perhaps use ARM-specific embedded tracing, that might tell you more about where the bottleneck is.

(I don’t know exactly which bits of this that the Jeton supports, but worth a read: https://community.arm.com/tools/b/blog/posts/performance-analysis-on-arm-embedded-linux-and-android-systems---by-javier-orensanz )

Hi cmclar204, your pipeline completely runs on CPU and no Tegra HW functions are leveraged.

Here is an usecase of opening onboard camera:
https://devtalk.nvidia.com/default/topic/987537/jetson-tx1/videocapture-fails-to-open-onboard-camera-l4t-24-2-1-opencv-3-1/post/5064902/#5064902

Please realize it is CPU-based frameworks, and we may not be able to achieve same performance as pure CPU platforms.

oprofile produces

CPU_CYCLES:100000|
  samples|      %|
------------------
   921858 100.000 python2.7
	CPU_CYCLES:100000|
	  samples|      %|
	------------------
	   275213 29.8542 no-vmlinux
	   132489 14.3720 libgstvideo-1.0.so.0.803.0
	   112099 12.1601 libgobject-2.0.so.0.4800.1
	   106282 11.5291 libglib-2.0.so.0.4800.1
	    78315  8.4953 libc-2.23.so
	    53593  5.8136 libgstreamer-1.0.so.0.803.0
	    31697  3.4384 libgio-2.0.so.0.4800.1
	    22683  2.4606 libpthread-2.23.so
	    21987  2.3851 libgdk-x11-2.0.so.0.2400.30
	    15294  1.6590 libgstcoreelements.so
	    14222  1.5428 libgstbase-1.0.so.0.803.0
	    13746  1.4911 libopencv_imgcodecs.so.3.3.0
	    11507  1.2482 libgstudp.so
	     9640  1.0457 libgstrtp.so

the data is being multicast through a managed switch with cameras attached via gigabit ethernet connection. I have tested this with packet tracers and other clients and the connection seems fine. I realise that transmitting raw video is very data intensive but at this point in time theres no other option unfortunately.

I’m currently leaning towards getting a cpu board to process the video into a manageable format and sending it on to the tx2 module to analyse.