GStreamer nvvidconv performance/cost (UYVY to NV12 for nvv4l2h264enc)

Hello,

When I run the following simplified GStreamer pipeline on NX with JP 5.0.2:

gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),format=UYVY,width=1280,height=720' ! fakesink

The CPU usage stays relatively the same, VDD_IN is about 4325 mW (also almost same as when idle).

But when I run:

gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),format=UYVY,width=1280,height=720' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fakesink

Then a single CPU is working at ~75% and VDD_IN jumps to 5506 mW. Since NV12 is required to further process UYVY whether it’s encoding or manipulating, am I doing something wrong? I was expecting this pipeline to execute (GPU/DMA) without any CPU copying.

Thank you!

Hi,
Would need your help to share tegrastats in the two cases for reference. Please execute

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

And execute sudo tegrastats to get system status.

Hello @DaneLLL!

Thank you for your prompt response, here are the results:

$ gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),width=1280,height=720,format=UYVY,framerate=50/1' ! fakesink

RAM 1547/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [3%@1906,0%@1906,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 115 APE 150 AUX@34.5C CPU@36.5C thermal@35.6C AO@36.5C GPU@35.5C PMIC@50C VDD_IN 4674mW/4661mW VDD_CPU_GPU_CV 554mW/554mW VDD_SOC 1307mW/1307mW
RAM 1547/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [0%@1908,1%@1906,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 115 APE 150 AUX@34.5C CPU@37C thermal@35.55C AO@36.5C GPU@35.5C PMIC@50C VDD_IN 4635mW/4657mW VDD_CPU_GPU_CV 515mW/548mW VDD_SOC 1307mW/1307mW
RAM 1547/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [1%@1906,1%@1896,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 115 APE 150 AUX@34.5C CPU@36.5C thermal@35.7C AO@36.5C GPU@36C PMIC@50C VDD_IN 4635mW/4654mW VDD_CPU_GPU_CV 554mW/549mW VDD_SOC 1307mW/1307mW
RAM 1547/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [4%@1910,2%@1929,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 115 APE 150 AUX@35C CPU@36.5C thermal@35.55C AO@36.5C GPU@36C PMIC@50C VDD_IN 4674mW/4656mW VDD_CPU_GPU_CV 554mW/549mW VDD_SOC 1307mW/1307mW
RAM 1547/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [3%@1907,1%@1907,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 115 APE 150 AUX@34.5C CPU@37C thermal@35.55C AO@36.5C GPU@35.5C PMIC@50C VDD_IN 4635mW/4654mW VDD_CPU_GPU_CV 554mW/550mW VDD_SOC 1307mW/1307mW

$ gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),width=1280,height=720,format=UYVY,framerate=50/1' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fakesink

RAM 1551/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [8%@1905,78%@1906,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 16%@115 APE 150 AUX@35C CPU@37.5C thermal@36.05C AO@36.5C GPU@36C PMIC@50C VDD_IN 5585mW/5605mW VDD_CPU_GPU_CV 1426mW/1430mW VDD_SOC 1344mW/1344mW
RAM 1551/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [11%@1907,76%@1904,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 18%@115 APE 150 AUX@35C CPU@37.5C thermal@36.05C AO@36.5C GPU@36C PMIC@50C VDD_IN 5585mW/5602mW VDD_CPU_GPU_CV 1423mW/1429mW VDD_SOC 1344mW/1344mW
RAM 1551/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [21%@1909,78%@1905,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 21%@115 APE 150 AUX@35C CPU@37.5C thermal@36.05C AO@36.5C GPU@36C PMIC@50C VDD_IN 5585mW/5600mW VDD_CPU_GPU_CV 1426mW/1429mW VDD_SOC 1344mW/1344mW
RAM 1551/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [21%@1905,77%@1903,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 16%@115 APE 150 AUX@35C CPU@37.5C thermal@36.05C AO@36.5C GPU@36C PMIC@50C VDD_IN 5625mW/5602mW VDD_CPU_GPU_CV 1423mW/1428mW VDD_SOC 1344mW/1344mW
RAM 1551/14908MB (lfb 3112x4MB) SWAP 0/7454MB (cached 0MB) CPU [24%@1907,77%@1907,off,off,off,off] EMC_FREQ 1%@1600 GR3D_FREQ 0%@1109 VIC_FREQ 19%@115 APE 150 AUX@35C CPU@37.5C thermal@36.05C AO@37C GPU@36C PMIC@50C VDD_IN 5585mW/5601mW VDD_CPU_GPU_CV 1426mW/1428mW VDD_SOC 1344mW/1344mW

Please advise.

Hi,
Do you use USB camera? Or YUV camera? Please share information(brand and model ID) about the camera.

Hi,

It’s not a USB camera but rather one with CSI output, and CSI-data-wise it works well, the video is being successfully processed and stored in NVMM by nvv4l2camerasrc as input (the video is displayed successfully with gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),format=UYVY,width=1280,height=720' ! nvvidconv ! autovideosink).

I think the question is what happens next in the pipeline, which is nvvidconv, shouldn’t the conversion happen in NVMM using the GPU? why is the CPU utilization happening?

Hi,
The pipeline is optimal and there is no additional memory copy. The CPU usage is unexpectedly high. Could you try jetson_multimedia_api sample:

/usr/src/jetson_multimedia_api/samples/12_camera_v4l2_cuda

It is similar to nvv4l2camerasrc ! nvvidconv. Please help try the sample and see if hugh CPU loading is still present. Would like to clarify if the issue is specific to gstreamer.

Hello @DaneLLL,

Ok, so for:

./camera_v4l2_cuda -d /dev/video0 -s 1280x720 -f UYVY -r 50

I’m getting:

$ sudo tegrastats
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [26%@1907,17%@1905,11%@1338,18%@1804,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 35%@204 VIC_FREQ 20%@115 APE 150 AUX@46C CPU@48C thermal@47.05C AO@46C GPU@47.5C PMIC@50C VDD_IN 5753mW/5753mW VDD_CPU_GPU_CV 1230mW/1230mW VDD_SOC 1507mW/1507mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [17%@1906,13%@1907,18%@1906,27%@1893,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 46%@204 VIC_FREQ 21%@115 APE 150 AUX@46C CPU@48.5C thermal@47.05C AO@46C GPU@47.5C PMIC@50C VDD_IN 5991mW/5872mW VDD_CPU_GPU_CV 1428mW/1329mW VDD_SOC 1505mW/1506mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [29%@1265,28%@1258,19%@1409,15%@1267,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 51%@204 VIC_FREQ 17%@115 APE 150 AUX@46.5C CPU@48.5C thermal@47.4C AO@46C GPU@47.5C PMIC@50C VDD_IN 5872mW/5872mW VDD_CPU_GPU_CV 1309mW/1322mW VDD_SOC 1507mW/1506mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [17%@1182,11%@1189,15%@1192,20%@1192,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 50%@204 VIC_FREQ 17%@115 APE 150 AUX@46.5C CPU@48C thermal@47.4C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 5643mW/5814mW VDD_CPU_GPU_CV 1073mW/1260mW VDD_SOC 1507mW/1506mW
RAM 1742/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [27%@1907,26%@1907,31%@1906,36%@1906,off,off] EMC_FREQ 5%@1600 GR3D_FREQ 47%@306 VIC_FREQ 18%@115 APE 150 AUX@46.5C CPU@49C thermal@47.4C AO@46.5C GPU@48C PMIC@50C VDD_IN 6745mW/6000mW VDD_CPU_GPU_CV 2063mW/1420mW VDD_SOC 1584mW/1522mW
RAM 1742/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [23%@1641,34%@1564,16%@1575,27%@1648,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 27%@408 VIC_FREQ 22%@115 APE 150 AUX@46.5C CPU@48.5C thermal@47.55C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 6031mW/6005mW VDD_CPU_GPU_CV 1468mW/1428mW VDD_SOC 1505mW/1519mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [16%@1189,27%@1190,21%@1190,11%@1190,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 27%@408 VIC_FREQ 16%@115 APE 150 AUX@46.5C CPU@48.5C thermal@47.25C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 5762mW/5971mW VDD_CPU_GPU_CV 1232mW/1400mW VDD_SOC 1507mW/1517mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [18%@1575,19%@1575,21%@1420,17%@1419,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 24%@408 VIC_FREQ 18%@115 APE 150 AUX@46C CPU@48.5C thermal@47.4C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 5723mW/5940mW VDD_CPU_GPU_CV 1192mW/1374mW VDD_SOC 1468mW/1511mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [25%@1707,28%@1719,13%@1724,16%@1724,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 7%@408 VIC_FREQ 19%@115 APE 150 AUX@46.5C CPU@48.5C thermal@47.4C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 5762mW/5920mW VDD_CPU_GPU_CV 1271mW/1362mW VDD_SOC 1507mW/1510mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [32%@1621,39%@1906,2%@1907,5%@1190,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 29%@306 VIC_FREQ 21%@115 APE 150 AUX@46.5C CPU@48.5C thermal@47.4C AO@46C GPU@47.5C PMIC@50C VDD_IN 5723mW/5900mW VDD_CPU_GPU_CV 1232mW/1349mW VDD_SOC 1507mW/1510mW
RAM 1742/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [13%@1905,34%@1190,14%@1190,16%@1517,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 32%@306 VIC_FREQ 20%@115 APE 150 AUX@46C CPU@48.5C thermal@47.4C AO@46C GPU@47.5C PMIC@50C VDD_IN 5713mW/5883mW VDD_CPU_GPU_CV 1192mW/1335mW VDD_SOC 1507mW/1510mW
RAM 1741/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [33%@1267,32%@1266,7%@1276,7%@1504,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 31%@306 VIC_FREQ 16%@115 APE 150 AUX@46.5C CPU@48C thermal@47.4C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 5683mW/5866mW VDD_CPU_GPU_CV 1192mW/1323mW VDD_SOC 1507mW/1509mW
RAM 1742/14908MB (lfb 1911x4MB) SWAP 0/7454MB (cached 0MB) CPU [27%@1495,13%@1497,5%@1493,31%@1555,off,off] EMC_FREQ 4%@1600 GR3D_FREQ 28%@306 VIC_FREQ 17%@115 APE 150 AUX@46C CPU@48.5C thermal@47.4C AO@46.5C GPU@47.5C PMIC@50C VDD_IN 5802mW/5861mW VDD_CPU_GPU_CV 1269mW/1319mW VDD_SOC 1507mW/1509mW

Is this giving any indication?

Thank you!

Hi,
Does the camera sensor have 25fps or 30fps mode for a try? Probably it is specific to 50fps mode.

Hi,

The driver can currently do only 50fps, but I fail to see the connection to fps. Can you please explain how the frame rate causes nvvidconv to use the CPU?

Can you please try the following on latest JP 5.0.2?

gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),format=UYVY,width=1280,height=720' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fakesink

Thank you.

Hi @DaneLLL,

Ok, some progress… the following results in low CPU and power consumption:

gst-launch-1.0 nvv4l2camerasrc ! 'video/x-raw(memory:NVMM),format=UYVY,width=1280,height=720,framerate=50/1' ! videorate ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! fakesink

So only by adding videorate before nvvidconv causes it to not use the CPU for conversion. Using videorate after nvvidconv exhibits the same high CPU usage as when omitting ‘videorate’.

This might be a workaround since ‘videorate’ that doesn’t change the framerate is probably “free” in terms of resources (?), still can you please escalate this issue with an engineer?

Thanks!

Hi,
We are checking this with our teams. Will update. If it works fine by adding videorate plugin, please use it as a quick solution.