How to reduce CPU load factor in opencv + gstreamer

My camera on the tx2 is transferred to the server via rtsp
(v4l2) I used /usr/src/jetson_multimedia_api/samples/12_camera_v4l2_cuda for the camera reading part
Using v4l2 to read the camera data and then transfer it via the gstreamer pipeline opened by cv::VideoWriter

cv::VideoWriter rtspWriter;
string gstOut = "appsrc ! videoconvert ! omxh264enc ! video/x-h264, stream-format=byte-stream ! rtspclientsink retry=2 protocols=tcp async-handling=true location=\"rtsp://rtsp_server/live/stream0\"";, 0, (double)25, cv::Size(1280, 720), true);

videoconvert seems to take up a lot of CPU resources, how should I optimize it?

Sorry, English is not my first language, thank you for any advice

Since OpenCV uses BGR format which is not supported by hardware engines in Jetson chip, would need to take certain CPU usage in this case. An optimal solution is to run gstreamer pipeline and use OpenCV CUDA filter filter. There is a sample for this usecase:
Nano not using GPU with gstreamer/python. Slow FPS, dropped frames - #8 by DaneLLL

Please check the sample and see if you can apply it to your usecase. If you have to run with cv::VideoWriter, please execute sudo nvpmodel -m 0 and sudo jetson_clocks to get max throughput of CPU cores.

RGB → YUV with videoconvert is very CPU expensive. You may use HW conversion with nvvidconv. However, the latter doesn"t support BGR, but supports BGRx or RGBA. So you may also try:

appsrc ! queue ! videoconvert ! video/x-raw,format=BGRx ! nvvidconv ! omxh264enc ! video/x-h264,format=byte-stream ! h264parse ! ...

Thank you both very much for your answers, I will try as soon as possible

Hello, I have some other questions, I tested something new about Gstreamer and would like to ask

This is my pipeline

Pushing streams
gst-launch-1.0 nvv4l2camerasrc device=/dev/video0 ! ‘video/x-raw(memory:NVMM),width=(int)1280,height=(int)720,framerate=(fraction)25/1’ ! nvvidconv ! omxh264enc ! ‘video/x-h264, stream-format=(string)byte-stream’ ! rtspclientsink protocols=tcp latency=0 async-handling=true location=“rtsp://rtspServer/live/stream0”

gst-launch-1.0 rtspsrc location=rtsp://rtspServer/live/stream0 protocols=tcp latency=0 ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! autovideosink

If I use this pipeline to push the video stream, I don’t have the problem of big latency, but the CPU load will be bigger

gst-launch-1.0 v4l2src device="/dev/video0" ! video/x-raw,width=1280,height=720 ! videoconvert ! omxh264enc ! rtspclientsink protocols=tcp async-handling=true latency=0 location=“rtsp://rtspServer/live/stream0”

because I have 5 cameras so this pipe is not suitable

Did I do something wrong and would like to get your help!

You may better explain the big latency problem, it’s not obvious without your camera.

I also think that 5 videoconvert instances that are CPU only would probably not be a good solution, so the NVMM path is probably better.
I assume your camera provides UYVY format, not sure what are the supported framerates, you only set 25 fps in this first pipeline.
You may add -v flag to gst-launch-1.0 so that you can see what caps are used between plugins.

You may also try nvv4l2h264enc plugin instead of omxh264enc (OMX plugins are going depracted on Jetson).

There may be some ways to deal with latency that depend on encoders.
Also be aware that setting latency=0 may not be the best option, so you may give a few frames latency.
You may also try to set sync=false for rtspclientsink if you don’t need sync.