tegra_multimedia_api/samples/frontend

Hi Folks

  1. We have two cameras and want to do 3 channels of 1080p encoding. At the same time we would like to do TRT inferencing, of a single scaled down camera channel. Currently I’m following argus/samples/multiChannel example. It seems to work well but does not perform in stable fashion for longer run. Second input encoder channel stops encoding after 50 sec. We are also doing some processing on GPU.

  2. I would like to follow flow of data in this example. I would like to understand how many frame buffer copies are happening, so that we can feel whether the performance would translate to our use case or not.

I see that data is getting copied for rendering and TRT processing -->

iNativeBuffer->copyToNvBuffer(buf.fd);

Will this adversely affect over DDR performance when we drive 3 encode at 1080p @ 30 fps ? For example when 3 encoders in original example are operated at (640, 480), (1280, 720), and (1920, 1080)

I see following profiling result -

Time elapsed:20 ms per frame in past 100 frames
[13632.260696] trt: FPS: 30.008732
[13633.260600] trt: FPS: 29.987015
[13634.255863] trt: FPS: 29.996340
Time elapsed:20 ms per frame in past 100 frames
[13635.255631] trt: FPS: 30.004890
[13636.272200] trt: FPS: 29.995470
[13637.258763] trt: FPS: 30.014797
Time elapsed:21 ms per frame in past 100 frames
[13638.270500] trt: FPS: 30.009783
[13639.270984] trt: FPS: 29.975960
[13640.269304] trt: FPS: 29.998680
[13641.269160] trt: FPS: 29.996880
Time elapsed:20 ms per frame in past 100 frames
[13642.255780] trt: FPS: 30.004501
[13643.255742] trt: FPS: 30.001080
[13644.256223] trt: FPS: 29.986296
Time elapsed:22 ms per frame in past 100 frames

But when all 3 encoders are cranked up to 1080p resolution, I see (everything else in the system remaining same ) some errors -

Time elapsed:23 ms per frame in past 100 frames
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17782.984971] trt: FPS: 16.859261
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17784.093142] trt: FPS: 14.546274
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17786.026728] trt: FPS: 16.034163
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
Time elapsed:26 ms per frame in past 100 frames
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17788.343454] trt: FPS: 13.379693
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17789.342046] trt: FPS: 14.991395
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17790.376679] trt: FPS: 14.063563
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)
[17792.166780] trt: FPS: 19.438797
Time elapsed:28 ms per frame in past 100 frames
[17793.178689] trt: FPS: 28.412458
[17794.156891] trt: FPS: 29.622911

Does ISO BW messages refer to DDR bw ? Is that an issue due to higher resolution encode ?

Thanks

Hi,
I don’t see multiChannel sample. Do you mean multiSensor or multiStream?

My apologies, I mean multiSensor.

The bigger issue that I would like your help on is - how can we have 3 concurrent 1080p (30 fps each) encode sessions going on , and still have enough DDR response time for GPU and CPU transactions - such that neither GPU nor CPU slows down due to encoder traffic.

Why do I get aforementioned error messages from fronend example when each of encode is 1080p ?

Thanks

Hi,
Not sure what DDR is? Please share the full name.
So you have two cameras and each camera goes into one TRT model?

DDR == dual data rate aka global memory from which every unit of the chip works.

The pipeline we have looks like -

Cam1 --> encoder --> encoded bitstream
     |
     |--> GPU --> CPU (frame processing, and data annotation ) -->
                                                                  | 
Cam2 --> encoder --> encoded bitstream                            |--> select either cam source --> encoder --> encoded bitstream
     |                                                            | 
     |--> GPU --> CPU (frame processing, and data annotation ) -->

Basically we need GPU/CPU processing along with three concurrent encode. Each encode is 1920x1080 @ 30 fps.

I am looking to model my code after tegra_multimedia_api/samples/frontend. My question to you is why does front end example starts giving errors (mentioned in #1 above) when I convert each of the 3 encoder to 1080 p resolutions.

THanks

We don’t have SQA tests same as your usecase. It loos like hitting certain sort of limitation, leading to the warning print. Have you execute jetson_clocks.sh and run tegrastats to check system loading?

Hi DaneLLL

Yes I tried with jetson_clocks, and tegra stats . I see same error after running jetson clocks. Following is log of tegrastats

RAM 2924/7851MB (lfb 676x4MB) cpu [18%@2035,off,off,4%@2034,13%@2035,18%@2035] EMC 24%@1600 APE 150 GR3D 38%@1134
RAM 3325/7851MB (lfb 656x4MB) cpu [18%@2035,off,off,35%@2035,68%@2034,20%@2035] EMC 23%@1600 APE 150 GR3D 5%@1134
RAM 3705/7851MB (lfb 580x4MB) cpu [8%@2038,off,off,77%@2035,26%@2035,13%@2035] EMC 20%@1600 APE 150 GR3D 0%@1134
RAM 4225/7851MB (lfb 504x4MB) cpu [26%@2000,off,off,7%@2019,11%@2011,54%@2009] EMC 19%@1600 APE 150 GR3D 0%@1134
RAM 4375/7851MB (lfb 466x4MB) cpu [39%@2001,off,off,32%@2003,37%@1996,31%@1999] EMC 53%@1600 APE 150 MSENC 1164 GR3D 72%@1134
RAM 4376/7851MB (lfb 465x4MB) cpu [22%@2034,off,off,24%@2034,22%@2035,24%@2036] EMC 68%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4377/7851MB (lfb 464x4MB) cpu [24%@2000,off,off,22%@2005,19%@2001,16%@2000] EMC 71%@1600 APE 150 MSENC 1164 GR3D 55%@1134
RAM 4378/7851MB (lfb 464x4MB) cpu [33%@2001,off,off,29%@2016,23%@2016,25%@2005] EMC 72%@1600 APE 150 MSENC 1164 GR3D 44%@1134
RAM 4378/7851MB (lfb 463x4MB) cpu [24%@2034,off,off,29%@2035,22%@2034,19%@2036] EMC 66%@1600 APE 150 MSENC 1164 GR3D 96%@1134
RAM 4378/7851MB (lfb 463x4MB) cpu [26%@2004,off,off,26%@2003,29%@2011,17%@2008] EMC 59%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4378/7851MB (lfb 463x4MB) cpu [20%@1996,off,off,25%@1994,15%@1991,16%@1994] EMC 52%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4369/7851MB (lfb 462x4MB) cpu [21%@2033,off,off,17%@2034,27%@2035,15%@2035] EMC 51%@1600 APE 150 MSENC 1164 GR3D 32%@1134
RAM 4368/7851MB (lfb 462x4MB) cpu [28%@2027,off,off,30%@2034,26%@2035,26%@2035] EMC 55%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4368/7851MB (lfb 462x4MB) cpu [36%@1997,off,off,31%@2003,26%@2016,20%@2011] EMC 60%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4368/7851MB (lfb 462x4MB) cpu [31%@2035,off,off,29%@2035,31%@2033,26%@2032] EMC 66%@1600 APE 150 MSENC 1164 GR3D 93%@1134
RAM 4369/7851MB (lfb 462x4MB) cpu [22%@2041,off,off,20%@2034,18%@2036,15%@2034] EMC 55%@1600 APE 150 MSENC 1164 GR3D 93%@1134
RAM 4368/7851MB (lfb 462x4MB) cpu [35%@2034,off,off,31%@2034,34%@2034,27%@2035] EMC 64%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4369/7851MB (lfb 462x4MB) cpu [19%@2033,off,off,25%@2035,25%@2035,22%@2035] EMC 58%@1600 APE 150 MSENC 1164 GR3D 42%@1134
RAM 4369/7851MB (lfb 462x4MB) cpu [19%@2035,off,off,13%@2036,15%@2035,14%@2035] EMC 51%@1600 APE 150 MSENC 1164 GR3D 42%@1134
RAM 4370/7851MB (lfb 462x4MB) cpu [20%@2035,off,off,24%@2034,15%@2035,14%@2034] EMC 47%@1600 APE 150 MSENC 1164 GR3D 0%@1134
RAM 4370/7851MB (lfb 462x4MB) cpu [34%@2035,off,off,16%@2034,20%@2035,20%@2034] EMC 53%@1600 APE 150 MSENC 1164 GR3D 69%@1134
RAM 4370/7851MB (lfb 462x4MB) cpu [31%@1990,off,off,17%@1995,30%@1987,21%@1992] EMC 58%@1600 APE 150 MSENC 1164 GR3D 56%@1134
RAM 4370/7851MB (lfb 461x4MB) cpu [31%@1997,off,off,39%@1999,30%@2001,26%@2033] EMC 60%@1600 APE 150 MSENC 1164 GR3D 18%@1134
RAM 4370/7851MB (lfb 460x4MB) cpu [35%@2034,off,off,25%@2034,32%@2033,21%@2035] EMC 63%@1600 APE 150 MSENC 1164 GR3D 85%@1134
RAM 4370/7851MB (lfb 460x4MB) cpu [27%@2031,off,off,20%@2035,29%@2034,15%@2033] EMC 56%@1600 APE 150 MSENC 1164 GR3D 90%@1134
RAM 4370/7851MB (lfb 459x4MB) cpu [31%@2035,off,off,22%@2034,29%@2034,26%@2034] EMC 58%@1600 APE 150 MSENC 1164 GR3D 60%@1134
RAM 4371/7851MB (lfb 459x4MB) cpu [32%@2000,off,off,37%@2001,35%@1999,27%@2002] EMC 64%@1600 APE 150 MSENC 1164 GR3D 0%@1134
RAM 4371/7851MB (lfb 459x4MB) cpu [32%@2035,off,off,35%@2034,31%@2035,25%@2035] EMC 63%@1600 APE 150 MSENC 1164 GR3D 92%@1134
RAM 4371/7851MB (lfb 458x4MB) cpu [45%@2034,off,off,35%@2036,35%@2034,28%@2034] EMC 65%@1600 APE 150 MSENC 1164 GR3D 88%@1134
RAM 4372/7851MB (lfb 458x4MB) cpu [35%@2005,off,off,24%@2008,36%@2007,21%@2010] EMC 62%@1600 APE 150 MSENC 1164 GR3D 3%@1134
RAM 4372/7851MB (lfb 458x4MB) cpu [29%@2035,off,off,30%@2035,31%@2035,22%@2035] EMC 60%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4372/7851MB (lfb 457x4MB) cpu [34%@2036,off,off,30%@2034,34%@2035,24%@2034] EMC 60%@1600 APE 150 MSENC 1164 GR3D 15%@1134
RAM 4372/7851MB (lfb 457x4MB) cpu [11%@2036,off,off,26%@2033,23%@2035,19%@2034] EMC 54%@1600 APE 150 MSENC 1164 GR3D 0%@1134
RAM 4372/7851MB (lfb 456x4MB) cpu [20%@2000,off,off,28%@2002,24%@2005,14%@2004] EMC 56%@1600 APE 150 MSENC 1164 GR3D 94%@1134
RAM 4373/7851MB (lfb 455x4MB) cpu [38%@2034,off,off,35%@2034,31%@2032,27%@2033] EMC 64%@1600 APE 150 MSENC 1164 GR3D 15%@1134
RAM 4373/7851MB (lfb 455x4MB) cpu [20%@2036,off,off,33%@2035,20%@2035,17%@2035] EMC 57%@1600 APE 150 MSENC 1164 GR3D 50%@1134
RAM 4373/7851MB (lfb 455x4MB) cpu [16%@2034,off,off,20%@2034,14%@2035,10%@2033] EMC 50%@1600 APE 150 MSENC 1164 GR3D 48%@1134
RAM 4374/7851MB (lfb 455x4MB) cpu [13%@2031,off,off,20%@2034,16%@2037,13%@2035] EMC 48%@1600 APE 150 MSENC 1164 GR3D 97%@1134
RAM 4374/7851MB (lfb 454x4MB) cpu [15%@2033,off,off,15%@2035,13%@2035,14%@2035] EMC 46%@1600 APE 150 MSENC 1164 GR3D 3%@1134
RAM 4374/7851MB (lfb 453x4MB) cpu [10%@2035,off,off,12%@2035,16%@2035,15%@2035] EMC 47%@1600 APE 150 MSENC 1164 GR3D 39%@1134
RAM 4374/7851MB (lfb 453x4MB) cpu [12%@2034,off,off,11%@2032,10%@2034,13%@2035] EMC 45%@1600 APE 150 MSENC 1164 GR3D 87%@1134
RAM 4371/7851MB (lfb 453x4MB) cpu [18%@2034,off,off,18%@2035,17%@2035,14%@2036] EMC 58%@1600 APE 150 MSENC 1164 GR3D 89%@1134
RAM 4371/7851MB (lfb 453x4MB) cpu [21%@2034,off,off,19%@2035,21%@2033,20%@2035] EMC 68%@1600 APE 150 MSENC 1164 GR3D 0%@1134
RAM 4368/7851MB (lfb 453x4MB) cpu [24%@2012,off,off,23%@2008,23%@2012,21%@2023] EMC 69%@1600 APE 150 MSENC 1164 GR3D 0%@1134
RAM 4368/7851MB (lfb 453x4MB) cpu [25%@1999,off,off,25%@1998,24%@1997,24%@1999] EMC 72%@1600 APE 150 MSENC 1164 GR3D 35%@1134
RAM 4368/7851MB (lfb 452x4MB) cpu [18%@2002,off,off,13%@2005,14%@2006,13%@2007] EMC 59%@1600 APE 150 MSENC 1164 GR3D 9%@1134
RAM 4368/7851MB (lfb 452x4MB) cpu [13%@2035,off,off,13%@2035,15%@2035,10%@2035] EMC 50%@1600 APE 150 MSENC 1164 GR3D 99%@1134
RAM 4369/7851MB (lfb 452x4MB) cpu [15%@2005,off,off,12%@2004,14%@1997,13%@2000] EMC 50%@1600 APE 150 MSENC 1164 GR3D 22%@1134

It seems GPU is getting saturated at times. However what I do not understand is why would encoder give error ?

Thanks

Further,

Following is the output of tegra stats after boosting up with nvpmodel.

How can we manage to put some ‘load’ on denver cores ?

Thanks

nvidia@tegra-ubuntu:~$ sudo nvpmodel -m 0
nvidia@tegra-ubuntu:~$ sudo ~/tegrastats
RAM 4531/7851MB (lfb 392x4MB) cpu [0%@2013,0%@345,0%@345,0%@2007,0%@2007,0%@2007] EMC 32%@1600 APE 150 MSENC 1164 GR3D 17%@1300
RAM 4533/7851MB (lfb 391x4MB) cpu [33%@345,1%@346,0%@345,29%@345,28%@345,31%@345] EMC 39%@1600 APE 150 MSENC 1164 GR3D 98%@420
RAM 4535/7851MB (lfb 391x4MB) cpu [32%@345,0%@345,0%@345,21%@345,28%@345,27%@345] EMC 32%@1600 APE 150 MSENC 1164 GR3D 0%@216
RAM 4537/7851MB (lfb 390x4MB) cpu [27%@345,0%@346,0%@345,29%@345,26%@345,23%@348] EMC 30%@1600 APE 150 MSENC 1164 GR3D 99%@216
RAM 4539/7851MB (lfb 390x4MB) cpu [29%@345,0%@345,0%@345,28%@345,28%@345,26%@345] EMC 29%@1600 APE 150 MSENC 1164 GR3D 0%@114
RAM 4539/7851MB (lfb 390x4MB) cpu [28%@345,0%@345,0%@345,31%@344,29%@345,34%@345] EMC 29%@1600 APE 150 MSENC 1164 GR3D 99%@114
RAM 4539/7851MB (lfb 389x4MB) cpu [35%@345,0%@345,0%@345,31%@345,26%@345,27%@345] EMC 29%@1600 APE 150 MSENC 1164 GR3D 55%@1134
RAM 4540/7851MB (lfb 389x4MB) cpu [30%@345,0%@345,0%@345,23%@347,27%@345,33%@348] EMC 28%@1600 APE 150 MSENC 1164 GR3D 0%@216
RAM 4540/7851MB (lfb 389x4MB) cpu [25%@345,1%@345,0%@345,32%@348,30%@488,23%@499] EMC 29%@1600 APE 150 MSENC 1164 GR3D 99%@216
RAM 4538/7851MB (lfb 389x4MB) cpu [31%@345,0%@345,0%@345,27%@348,29%@348,27%@348] EMC 28%@1600 APE 150 MSENC 1164 GR3D 0%@624
RAM 4539/7851MB (lfb 389x4MB) cpu [28%@345,0%@345,0%@345,26%@345,33%@345,28%@345] EMC 28%@1600 APE 150 MSENC 1164 GR3D 0%@624
RAM 4539/7851MB (lfb 389x4MB) cpu [25%@345,0%@345,0%@345,27%@348,28%@346,27%@346] EMC 28%@1600 APE 150 MSENC 1164 GR3D 0%@420
RAM 4539/7851MB (lfb 389x4MB) cpu [23%@346,0%@345,0%@345,24%@345,30%@345,30%@345] EMC 28%@1600 APE 150 MSENC 1164 GR3D 99%@216
RAM 4540/7851MB (lfb 387x4MB) cpu [36%@345,0%@345,0%@345,27%@344,27%@344,27%@345] EMC 29%@1600 APE 150 MSENC 1164 GR3D 99%@216
RAM 4540/7851MB (lfb 387x4MB) cpu [36%@806,0%@345,0%@345,32%@653,36%@653,31%@653] EMC 38%@1600 APE 150 MSENC 1164 GR3D 76%@1134
RAM 4540/7851MB (lfb 387x4MB) cpu [36%@806,0%@345,0%@499,34%@806,33%@806,32%@806] EMC 37%@1600 APE 150 MSENC 1164 GR3D 98%@114
RAM 4541/7851MB (lfb 386x4MB) cpu [35%@345,0%@345,0%@345,33%@346,38%@348,34%@348] EMC 46%@1600 APE 150 MSENC 1164 GR3D 0%@930
RAM 4541/7851MB (lfb 386x4MB) cpu [28%@345,0%@345,0%@345,28%@345,25%@346,27%@345] EMC 35%@1600 APE 150 MSENC 1164 GR3D 58%@624
RAM 4542/7851MB (lfb 386x4MB) cpu [29%@345,0%@345,0%@345,27%@345,25%@345,26%@345] EMC 31%@1600 APE 150 MSENC 1164 GR3D 99%@114
RAM 4542/7851MB (lfb 386x4MB) cpu [27%@499,0%@345,0%@345,25%@499,28%@499,29%@498] EMC 29%@1600 APE 150 MSENC 1164 GR3D 22%@114
RAM 4542/7851MB (lfb 386x4MB) cpu [29%@499,0%@345,0%@345,25%@499,28%@499,28%@499] EMC 28%@1600 APE 150 MSENC 1164 GR3D 10%@726
RAM 4543/7851MB (lfb 386x4MB) cpu [29%@345,0%@345,0%@345,30%@345,27%@347,28%@348] EMC 28%@1600 APE 150 MSENC 1164 GR3D 99%@216
RAM 4543/7851MB (lfb 385x4MB) cpu [31%@345,0%@345,0%@345,30%@347,27%@345,29%@348] EMC 28%@1600 APE 150 MSENC 1164 GR3D 99%@114
RAM 4544/7851MB (lfb 385x4MB) cpu [31%@345,0%@345,0%@345,27%@345,28%@345,24%@345] EMC 28%@1600 APE 150 MSENC 1164 GR3D 99%@114
RAM 4544/7851MB (lfb 385x4MB) cpu [27%@345,0%@345,0%@345,25%@345,29%@345,28%@345] EMC 28%@1600 APE 150 MSENC 1164 GR3D 99%@318
RAM 4544/7851MB (lfb 385x4MB) cpu [31%@345,0%@345,0%@345,31%@345,27%@345,32%@345] EMC 30%@1600 APE 150 MSENC 1164 GR3D 10%@828
RAM 4544/7851MB (lfb 385x4MB) cpu [35%@498,0%@345,0%@345,37%@499,38%@499,34%@499] EMC 39%@1600 APE 150 MSENC 1164 GR3D 99%@522
RAM 4544/7851MB (lfb 384x4MB) cpu [33%@498,0%@345,0%@345,36%@501,35%@499,33%@501] EMC 47%@1600 APE 150 MSENC 1164 GR3D 0%@1032

Hi,
The print is harmless warning.

SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 4989600 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 474)

Your issue is more like hitting max GPU performance. Do you see issues if you skip GPU processing and do video encoding only? It should be able to run three 1080p30 encoding threads.

Hi DaneLLL,

Our use case needs 3 encoding and GPU processing together. Hence I want to follow this example. Is there another way to ascertain that we getting limited either by GPU or by system bandwidth ?

Thanks

Hi,
tegrastats is the tool showing loading of HW components(CPU, GPU, EMC, MSENC, …)
From your log, GR3D shows full loading in most times.

Sugget you break down the pipeline to check where the bottleneck is. You may eliminate encoder first.

Cam1 --> (check fps)
     |
     |--> GPU --> CPU (frame processing, and data annotation ) -->
                                                                  | 
Cam2  --> (check fps)                                             |--> select either cam source --> (check fps)
     |                                                            | 
     |--> GPU --> CPU (frame processing, and data annotation ) -->