Is busy CPU usage when jetson nano hardware decoding is doing task?

Hi guys,
I have some question about the hardware decoder of jetson nano.

1- I want a clean code for multi-stream decoding using python code
2- When nano is decoding multi-stream using special decoding hardware, Is also CPU busy for this task? if so, why? Is’t done this process only on hardware decoding?

Hi,
You can run sudo tegrastats to get system status. If you see NVDEC in tegrastats, it is hardware decoding.

We enabl ehardware acceleration in tegra_multimedia_api and gstreamer. If you use gstreamer nvv4l2decoder element in your python code, it is hardware decoding.

Thanks a lot.

I use Jetpack 4.2 and opencv 3.4.

1- When I use these below gstreamer elements in opencv, I get some results:
CPU usage 30-45 %, when decoding 8 streaming with 1920x1080 resolution, and I also can decoding 9 streaming with 1920x1080 resolution, but Jetson nano support 8 stream, righ? why this problem can be happend?

rtspsrc location={uri} latency={latency} ! rtph265depay ! h265parse ! omxh265dec !
nvvidconv ! video/x-raw, width=(int){width}, height=(int){height}, format=(string)BGRx ! videoconvert ! appsink

sudo tegrastats:

RAM 447/3964MB (lfb 638x4MB) SWAP 0/10174MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [30%@1428,25%@1428,35%@1428,37%@1428] EMC_FREQ 11%@1600 GR3D_FREQ 0%@921 APE 25 PLL@37C CPU@40.5C PMIC@100C GPU@37.5C AO@47C thermal@38.75C POM_5V_IN 4534/4319 POM_5V_GPU 166/166 POM_5V_CPU 2450/2246

2- When I use these below gstreamer elemets:
I can decode 8-9 stream but even with only 4 stream the usage of CPU reach to 100%.

rtspsrc location={} latency={} ! rtph265depay ! h265parse ! avdec_h265 ! videoconvert ! appsink

Q1-I think the state 1 is use hardware accelerator for decoding the streams and state 2 is definitly use cpu decoding, right?

Q2-In your optinin, I add nvv4l2decoder into the state 1?

Q3-I saw some blogs that told, omxh265dec is for hardware decoding and avdec_h265 is for cpu decoding, what’s your opinion?

Q1

Yes, you have that right. A full guide to the accelerated elements can be found at the link below.

I add nvv4l2decoder into the state 1?

According to the documentation, omx decoders are deprecated, so you might want to use nvv4l2decoder anyway.

When nano is decoding multi-stream using special decoding hardware, Is also CPU busy for this task?

No. With CUDA, your gpu can be doing something (even many things), while your CPU does something else entirely. Here are some examples from PyTorch of how it can work.

Thanks.
In the Q3, my mean is that, As you see in my sudo tegrastats, usage of cpu is there, why? whereas I run only python code for multi-stream on hardware decoding. why cpu is busy? Is it common? or that is due to decoding process?

Hi,
Please check

You have enabled hardware decoding but there is memcpy() between gstreamer and OpenCV. This takes CPU usage.

You mean is that I don’t use zero-copy memory method? I do duplicated data in memory? How to correct this problem? If is set appsink as gstream elemet, this problem can be solve?

I use these elements gstreamer in the opencv and the sudo tegrastats result is shown the state 1 above:

gst_str = f’rtspsrc location={uri} latency={latency} ! rtph264depay ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, width=(int){width}, height=(int){height}, format=(string)BGRx ! videoconvert ! appsink’

cv2.VideoCapture(gst_str, cv2.CAP_GSTREAMER)

Hi,

For using appsink in OpenCV, this cannot be eliminated. Optimal pipeline on Jetson Nano is to have NVMM bufffers from source to sink. OpenCV is CPU-based stackand it only accepts CPU buffers in appink.

An optimal solution is to leverage CUDA APIs and tegra_multimedia_api. Please refer to

Optimal pipeline on Jetson Nano is to have NVMM bufffers from source to sink

add video / x - raw ( memory:NVMM ) in gstream elements?

I want to use rtsp streaming in python code. I don’t know, Can I use tegra_multimedia_api in python code or no.

Hi,

No, tegra_multimedia_api is not supported in python.

The CPU usage is explained. If you have to use appink in OpenCV, please realize it.

Thanks a lot,
What’s diffrence between usage of video/ x- raw ( memory:NVMM) and video/ x- raw?
Please give me a optimal way of gstream elements for decoding multi-streams using hardware decosing with python appilication.

Here are some python examples and notebooks:

However if your idea is to modify the buffer (the image itself) within Python, that’s currently not possible. You can modify the metadata (eg. bounding box coordinates of detections, labels, etc), but not the image. It would be very slow in Python.

If you want to use OpenCV within DeepStream, there is an example plugin that does exactly that, however it’s written in c++. You could then use that plugin within Python.

That being said, your plugin will be much faster if you avoid any memory copies and keep the buffer in NVMM, which is GPU memory, so if you do use the example plugin linked above and you know some cuda, you may wish to tear out the OpenCV Mat conversion. As DaneLLL mentions, memory copies are expensive.

Hi
I use omx264dec element in gstreamer, I see NVDEC in tegrastats, that’s mean that nano use hardware decoder, right?
but I don’t know why i use nvv4l2decoder, the decoding is stop, I thinks I have to use another combine of elements for nvv4l2decoder, right?

gst-launch-1.0 rtspsrc location=rtsp latency=300 ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! ‘video/x-raw(memory:NVMM)’, ‘format=(string)BGRx’ ! videoconvert ! fakesink

Hi,
When you see NVDEC is shown in tegrastats, it means it is in use.

The pipelines looks not right. You should send videox-raw to videoconvert. Please try

gst-launch-1.0 rtspsrc location=rtsp latency=300 ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=(string)BGRx ! videoconvert ! fakesink

I run your suggest pipeline, then I get the below logging, but the NVDEC is not shown. and the cpu usage is stucked to zero.
when I use omxh264dec instead of nvv4l2decoder, the NVDEC is shown.

jnano@jnano-desktop:~$ gst-launch-1.0 rtspsrc location=rtsp://192.168.1.101:8554/1920x1080.264 latency=300 ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! ‘video/x-raw,format=(string)BGRx’ ! videoconvert ! fakesink
nvbuf_utils: Could not get EGL display connection
Setting pipeline to PAUSED …
Opening in BLOCKING MODE
Pipeline is live and does not need PREROLL …
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp://192.168.1.101:8554/1920x1080.264
Progress: (open) Retrieving server options
Progress: (open) Retrieving media info
Progress: (request) SETUP stream 0
Progress: (open) Opened Stream
Setting pipeline to PLAYING …
New clock: GstSystemClock
Progress: (request) Sending PLAY request
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261

(gst-launch-1.0:8568): GStreamer-CRITICAL **: 18:38:57.211: gst_mini_object_unref: assertion ‘mini_object != NULL’ failed
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261

other question:
I want to push the gstream decoded result in python code, How I can solve this probem?

Hi,
Please download either video file and try
http://jell.yfish.us/

$ gst-launch-1.0 filesrc location= jellyfish-5-mbps-hd-h264.mkv ! matroskademux ! h264parse ! nvv4l2decoder ! nvoverlaysink

If you don’t see NVDEC in sudo tegrastats, We suggest upgrade to Jetpack 4.2.3 or 4.3.

I get this error:

nvbuf_utils: Could not get EGL display connection
Setting pipeline to PAUSED …
Opening in BLOCKING MODE
Pipeline is PREROLLING …
ERROR: from element /GstPipeline:pipeline0/GstFileSrc:filesrc0: Internal data stream error.
Additional debug info:
gstbasesrc.c(3055): gst_base_src_loop (): /GstPipeline:pipeline0/GstFileSrc:filesrc0:
streaming stopped, reason error (-5)
ERROR: pipeline doesn’t want to preroll.
Setting pipeline to NULL …
Freeing pipeline …

Please upgrade to pgrade to Jetpack 4.2.3 or 4.3