Finding the bottleneck in video stitching application

Hi Shane,

the tegrastats I posted are from my dummy stitching process(or do you refer to something different?). Like for nvcamarasrc, I can grab 5 cameras with 30FPS with argus, if I do not perform stitching. However, then there is nearly no performance left to do computations without losing frames.

There is definitely no problem with frame synchronization(or providing it to the application). If I am just starting the argus OutputStream without acquiring frames and instead push an uninitialized buffer to the stitching functions the application slows down massively.

Could it be that there is some problem with the ISP?
I was expecting that the TX1 should handle all the video processing on its specified hardware which should lead to low ARM load.

Hi crossfire5
If you can get 5 camera 30fps without stitching that prove there’s no problem with ISP. The frame rate drop because the pipeline block by the stitching process. I guess you need to go through stitching class to break down it.

And not quite understand what’s your mean

Hi Shane,

in my application I can run modules separately. I have 3 main modules:

  • Camera Grabbing
  • Stitching
  • Encoding

now I can simply exchange the camera module with a fake camera mode which is just allocation buffer and puuhing them to the stitching module. I can also start the argus camera module without acquiring frames from it.
Running the fake frame provider->stitching->encoding works with 30 fps. Now, if I add 5 Argus::CameraSessions, just in parallel, without using them to acquiring frames, the framerate is dropping down to 16FPS.

I also can just start the 5 argus cameras and can get 30FPS, it just takes to much performance from the TX1.

I think it could help if you post the tegrastats when you are using 5 cameras. This way I could compare the system load with my implementation.

Do that means there are two instances run at the same time, one is argsu open without grabbing frame, another program run the stitching from testsrc and only get 16FPS?

Yes exactly. While testsrc(will call it fakesrc) is in this case is actually just an allocated unmodified memory buffer.
Here a FPS overview running the stitchinng application:

  • fakesrc(random data): 30FPS
  • videotestsrc: 30FPS
  • nvcamerasrc:16FPS
  • argus:16FPS
  • fakesrc + argus in background: 16 FPS

Hi crossfire
Could you check if the argus demon is really inactive with you argsu APP without grabbing frame.

sudo su
kill argus_daemon
export enableCamScfLogs=1
/usr/sbin/argus_daemon

Hi Shane,

Thank you I will test the next week.

We have found CSI cameras that can provide UYVY images over v4l2. Using 5xv4l2src @1080p we can run the video stitching application with 25FPS. This cameras seems to take much less system performance than the cameras we was using over nvcamerasrc.

I made a small performance benchmark compares v4l2rc with nvcamerasrc. One have to say that nvcamerasrc does I420 conversion, I was not expecting that this would be that performance hungry, since it is should be offloaded to the ISP. Am I wrong with this expectations?

gst-launch-1.0 -v \ 
v4l2src device=/dev/video0 io-mode=1 ! video/x-bayer, format=rggb, width=1640, height=1232, framerate=30/1 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \ 
v4l2src device=/dev/video1 io-mode=1 ! video/x-bayer, format=rggb, width=1640, height=1232, framerate=30/1 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \ 
v4l2src device=/dev/video2 io-mode=1 ! video/x-bayer, format=rggb, width=1640, height=1232, framerate=30/1 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \ 
v4l2src device=/dev/video3 io-mode=1 ! video/x-bayer, format=rggb, width=1640, height=1232, framerate=30/1 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \ 
v4l2src device=/dev/video4 io-mode=1 ! video/x-bayer, format=rggb, width=1640, height=1232, framerate=30/1 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \

...

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink1: last-message = rendered: 225, dropped: 0, current: 31.01, average: 31.14
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink3: last-message = rendered: 225, dropped: 0, current: 30.94, average: 31.14
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink4: last-message = rendered: 241, dropped: 0, current: 30.93, average: 31.18
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink2: last-message = rendered: 241, dropped: 0, current: 31.00, average: 31.15
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 241, dropped: 0, current: 31.03, average: 31.14
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink1: last-message = rendered: 241, dropped: 0, current: 30.99, average: 31.13

...

RAM 771/3995MB (lfb 685x4MB) cpu [16%,6%,40%,41%]@518 EMC 9%@408 AVP 52%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [14%,9%,43%,42%]@825 EMC 6%@665 AVP 45%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [21%,4%,39%,39%]@921 EMC 6%@665 AVP 45%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [3%,18%,34%,35%]@1224 EMC 4%@1065 AVP 52%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [25%,1%,39%,41%]@614 EMC 12%@408 AVP 52%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [13%,11%,40%,36%]@825 EMC 7%@665 AVP 52%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [16%,8%,41%,39%]@518 EMC 12%@408 AVP 52%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [17%,34%,23%,48%]@518 EMC 12%@408 AVP 45%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [35%,19%,24%,47%]@710 EMC 12%@408 AVP 38%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 771/3995MB (lfb 685x4MB) cpu [22%,26%,21%,47%]@518 EMC 12%@408 AVP 38%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734

----------------------------------------------------------------------------------------------------------------------------------------------

gst-launch-1.0 -v \
nvcamerasrc sensor-id=0 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=1 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=2 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=3 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=4 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \

....

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink2: last-message = rendered: 929, dropped: 0, current: 30.93, average: 31.05
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink3: last-message = rendered: 930, dropped: 0, current: 30.81, average: 31.07
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink1: last-message = rendered: 929, dropped: 0, current: 31.38, average: 31.05
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink2: last-message = rendered: 977, dropped: 0, current: 30.78, average: 31.05
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink3: last-message = rendered: 978, dropped: 0, current: 30.55, average: 31.06
/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink1: last-message = rendered: 977, dropped: 0, current: 31.20, average: 31.05

....

RAM 1620/3995MB (lfb 530x4MB) cpu [55%,53%,39%,47%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [50%,60%,43%,42%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [49%,61%,44%,41%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [43%,60%,36%,43%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [44%,63%,42%,39%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [46%,60%,38%,40%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1621/3995MB (lfb 530x4MB) cpu [46%,50%,42%,43%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [47%,53%,46%,44%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [47%,55%,47%,36%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [56%,45%,42%,38%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [45%,61%,42%,42%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [41%,61%,42%,43%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1620/3995MB (lfb 530x4MB) cpu [44%,60%,40%,49%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1637/3995MB (lfb 527x4MB) cpu [65%,53%,50%,52%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1622/3995MB (lfb 527x4MB) cpu [48%,65%,50%,51%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1622/3995MB (lfb 527x4MB) cpu [47%,62%,41%,42%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1622/3995MB (lfb 527x4MB) cpu [43%,62%,44%,42%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
RAM 1622/3995MB (lfb 527x4MB) cpu [44%,60%,40%,46%]@1734 EMC 29%@1600 AVP 12%@12 NVDEC 268 MSENC 268 GR3D 0%@76 EDP limit 1734
i

If I run argus without grabbing frames, I still get a tons of these output:

iberScheduler: cc 4517, fiber 0x8d3d80 succeeded async operation

Thread 1 is working on CC 4517

CC 4517 completed step 10 in fiber 0x8d3d80

CC 4517 completed step 11 in fiber 0x8d3d80

CC 4517 completed step 12 in fiber 0x8d3d80

CC 4517 processing step 13 in fiber 0x8d3d80

FiberScheduler: cc 4517, fiber 0x8d3d80 in progress...

Thread 1 getting next capture

Thread 1 is waiting

Thread 2 is waiting

exposureTime=0.033333 analogGain=16.000000 digitalGain=1.000000 commonGain=0.533333 expComp=1.000000

lux in Statsupdate: isAohdrEnable=0 currentSceneLux 11.161

FiberScheduler: cc 4517, fiber 0x8d3d80 succeeded async operation

Thread 3 is working on CC 4517

CC 4517 completed step 14 in fiber 0x8d3d80

CC 4517 processing step 15 in fiber 0x8d3d80

FiberScheduler: cc 4517, fiber 0x8d3d80 in progress...

Thread 3 getting next capture

FiberScheduler: cc 4517, fiber 0x8d3d80 succeeded async operation

Thread 3 is working on CC 4517

CC 4517 processing step 16 in fiber 0x8d3d80

NV AF analysis algorithm is active.

Thread 4 is waiting

Thread 1 is waiting

FiberScheduler: cc 4517, fiber 0x8d3d80 in progress...

Thread 3 getting next capture

Thread 3 is waiting

Thread 2 is waiting

AfAnalysis cc 4517 push FK_ISP_RUN_NUMBER=417.
FiberScheduler: cc 4517, fiber 0x8d3d80 succeeded async operation

Thread 4 is working on CC 4517

CC 4517 processing step 17 in fiber 0x8d3d80

FiberScheduler: cc 4517, fiber 0x8d3d80 in progress...

Thread 4 getting next capture

FiberScheduler: cc 4517, fiber 0x8d3d80 succeeded async operation

Thread 4 is working on CC 4517

CC 4517 completed step 18 in fiber 0x8d3d80

Thread 1 is waiting

CC 4517 processing step 19 in fiber 0x8d3d80

FiberScheduler: cc 4517, fiber 0x8d3d80 in progress...

Thread 4 getting next capture

Thread 4 is waiting

Thread 3 is waiting

Thread 2 is waiting

FiberScheduler: cc 4517, fiber 0x8d3d80 succeeded async operation

Thread 1 is working on CC 4517

CC 4517 completed step 20 in fiber 0x8d3d80

CC 4517 completed step 21 in fiber 0x8d3d80

populateStaticProperties

CC 4517 completed step 22 in fiber 0x8d3d80

CC 4517 completed step 23 in fiber 0x8d3d80

CC 4517 completed step 24 in fiber 0x8d3d80

CC 4517 completed step 25 in fiber 0x8d3d80

CC 4517 completed step 26 in fiber 0x8d3d80

CC 4517 completed step 27 in fiber 0x8d3d80

Hi crossfire
From those experience show the argus are active to capture frame data even not grab frame. That can say you stitching code seems need CPU resource. If the argus + stitching without memory copy should get good result.
And please try to boost the system performance to check the result.

Hi Shane,

all clocks are boosted to maximum.
Did you check the benchmark of post #28?

It just compares v4l2src to nvcamerasrc.

Can you reproduce this on your 5 camera system?
Do you think the results for nvcamerasrc looks ok?

Hi crossfire

  1. Because the nvcamerasrc need run 3A and lots of things so I believe the CPU usage more than the v4l2src is as expect.
  2. Do you want me to repo gstreamer to launch 5 sensor?

Hi,

yes maybe you could launch this gstreamer pipeline, so that we can compare the tegrastats:

gst-launch-1.0 -v \
nvcamerasrc sensor-id=0 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=1 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=2 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=3 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \
nvcamerasrc sensor-id=4 ! "video/x-raw(memory:NVMM), format=I420, width=1640, height=1232, framerate=30/1" ! nvvidconv ! video/x-raw, format=I420 ! fpsdisplaysink text-overlay=false video-sink="appsink max-buffers=2 drop=true" \

you might have to change the resolution to 1080p if your sensor is not supporting 1640x1232

Hi crossfire
I guess my log won’t help for you to fix your problem. But I will try to get it. Could you try 3 sensor use case can get the 30FPS this use case ever verify internally for the 360 degree camera similar your project.

Hi Shane,

the tegrastats would help to to see if the system performance of nvcamerasrc is dependent on the sensors that has been used or not.

I can run the stitching application with 3 cameras (using nvcamerasrc) between 26FPS and 28FPS.

Below is tegrastats from 5 camera been launched. Does you stitching use cuda implement? Don’t understand why it so sensitive with CPU loading?

RAM 1919/3995MB (lfb 362x4MB) cpu [59%,32%,37%,33%]@1428 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [55%,40%,45%,43%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [55%,36%,42%,39%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [57%,37%,44%,35%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [55%,35%,44%,38%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [59%,43%,37%,37%]@1224 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [57%,38%,37%,39%]@1555 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [57%,44%,43%,42%]@1555 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [58%,37%,41%,42%]@1428 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [63%,39%,46%,38%]@1632 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [57%,38%,35%,38%]@1632 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [47%,54%,25%,41%]@1326 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [40%,60%,44%,40%]@1555 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [38%,60%,35%,40%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [43%,58%,35%,38%]@1326 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [35%,55%,43%,43%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,58%,36%,36%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [42%,57%,45%,41%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [39%,56%,41%,32%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [41%,57%,37%,44%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [34%,55%,34%,39%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,56%,41%,40%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [42%,58%,41%,39%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [41%,54%,44%,40%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [45%,54%,46%,37%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [38%,56%,35%,38%]@1326 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [37%,54%,42%,42%]@1326 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [38%,57%,37%,42%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [37%,57%,37%,39%]@1326 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [41%,58%,40%,37%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [41%,53%,42%,38%]@1734 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [39%,54%,43%,34%]@1734 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [41%,50%,47%,38%]@1326 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [36%,54%,40%,40%]@1632 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [41%,56%,30%,40%]@1555 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [42%,56%,37%,32%]@1734 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [36%,55%,36%,35%]@1632 GR3D 0%@76 EDP limit 0
RAM 1920/3995MB (lfb 362x4MB) cpu [45%,56%,43%,35%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,55%,36%,37%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,58%,42%,35%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [38%,53%,40%,33%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [39%,53%,38%,45%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [37%,56%,33%,38%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,57%,41%,32%]@1555 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [43%,56%,45%,35%]@1428 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [39%,56%,38%,34%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [44%,54%,39%,44%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [36%,57%,34%,40%]@1734 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [33%,59%,39%,44%]@1132 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,52%,38%,27%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [40%,57%,34%,36%]@1428 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [41%,54%,42%,37%]@1428 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [35%,56%,35%,33%]@1632 GR3D 0%@76 EDP limit 0
RAM 1919/3995MB (lfb 362x4MB) cpu [35%,56%,32%,40%]@1224 GR3D 0%@76 EDP limit 0

Hi Shane,

thank you for the stats. Since the performance is similar to my benchmark, I think there is no problem with the sensor.

Since I do not know what the ISP is actually doing I find these usage a little high.
Can you explain where nvcamarasrc is using main memory?
what does “nvcamerasrc need run 3A” mean?

The stitching is performed in CUDA, I think the limiting factor is not the CPU but the memory bandwidth / EMC. I observed in several performance benchmarks, that the EMC is not much more than 50%. If the EMC reaches 50% the application starts to slow down.

Could it be that EMC=50% is wrongly calculated and is actually 100%?

Hi Crossfire
Can you try some memory stress tools and run your stitching to confirm the memory things.
Also if there’s no memory tools try CPU stress tools to check stitching won’t have impact by CPU loading.

Hi Shane,
It will take me some time to do that efficiently. How can I stress the EMC best without using CPU/GPU?

I just realized that the gstreamer benchmark you postet is not at maximized clock rates and the EMC is missing. I would be really grateful of you could run it again with maximized clocks and as sudo.

I have no idea for any tools can stress the memory, too. Could you confirm the CPU stress first.

RAM 1922/3995MB (lfb 340x4MB) cpu [46%,33%,36%,31%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [32%,42%,33%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [35%,53%,29%,33%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [34%,53%,34%,32%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [33%,53%,35%,36%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [39%,50%,34%,34%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [29%,48%,37%,37%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [40%,53%,34%,29%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [36%,54%,30%,34%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [50%,36%,34%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [51%,36%,40%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [50%,29%,38%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [49%,34%,36%,30%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [51%,34%,31%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [48%,28%,38%,38%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [47%,33%,32%,34%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [51%,23%,33%,34%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1921/3995MB (lfb 340x4MB) cpu [48%,37%,34%,33%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [52%,35%,30%,26%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [48%,38%,33%,36%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [49%,31%,34%,31%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [54%,32%,31%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [53%,31%,30%,30%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [48%,36%,36%,29%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [47%,27%,34%,34%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [52%,29%,32%,31%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [54%,34%,27%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [52%,29%,30%,33%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [53%,29%,34%,30%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [57%,32%,28%,35%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [54%,38%,36%,27%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [46%,37%,35%,30%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [51%,34%,37%,33%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [53%,29%,35%,29%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1922/3995MB (lfb 340x4MB) cpu [54%,30%,38%,34%]@1734 EMC 30%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1893/3995MB (lfb 340x4MB) cpu [43%,24%,33%,30%]@1734 EMC 25%@1600 AVP 2%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734