Finding the bottleneck in video stitching application

Hi Shane,

Thank you for the stats.

I made a small CPU stress test like this:

int main()
{
    long unsigned int b = 0;
    #pragma omp parallel for
    for(long unsigned int  i=0; i<100000000000; i++)
    {
        long unsigned int c = b++;
    }
}

I used this during I running a pipeline
fakesrc → stitching → encoder
This are the results WITHOUT stress test:

RAM 1916/3995MB (lfb 114x4MB) cpu [20%,31%,25%,19%]@1734 EMC 48%@1600 AVP 13%@12 NVDEC 716 MSENC 716 GR3D 84%@998 EDP limit 1734
RAM 1916/3995MB (lfb 114x4MB) cpu [13%,38%,32%,10%]@1734 EMC 48%@1600 AVP 13%@12 NVDEC 716 MSENC 716 GR3D 99%@998 EDP limit 1734
RAM 1916/3995MB (lfb 114x4MB) cpu [13%,54%,23%,4%]@1734 EMC 48%@1600 AVP 14%@12 NVDEC 716 MSENC 716 GR3D 90%@998 EDP limit 1734
RAM 1917/3995MB (lfb 114x4MB) cpu [18%,35%,26%,19%]@1734 EMC 48%@1600 AVP 24%@12 NVDEC 716 MSENC 716 GR3D 66%@998 EDP limit 1734
RAM 1917/3995MB (lfb 114x4MB) cpu [21%,14%,35%,33%]@1734 EMC 48%@1600 AVP 24%@12 NVDEC 716 MSENC 716 GR3D 46%@998 EDP limit 1734
RAM 1916/3995MB (lfb 114x4MB) cpu [34%,20%,34%,17%]@1734 EMC 48%@1600 AVP 17%@12 NVDEC 716 MSENC 716 GR3D 35%@998 EDP limit 1734
RAM 1916/3995MB (lfb 114x4MB) cpu [22%,18%,40%,19%]@1734 EMC 48%@1600 AVP 10%@12 NVDEC 716 MSENC 716 GR3D 29%@998 EDP limit 1734
RAM 1916/3995MB (lfb 114x4MB) cpu [34%,16%,19%,23%]@1734 EMC 48%@1600 AVP 10%@12 NVDEC 716 MSENC 716 GR3D 23%@998 EDP limit 1734
RAM 1916/3995MB (lfb 114x4MB) cpu [34%,14%,26%,25%]@1734 EMC 48%@1600 AVP 25%@12 NVDEC 716 MSENC 716 GR3D 91%@998 EDP limit 1734
RAM 1917/3995MB (lfb 113x4MB) cpu [17%,19%,24%,42%]@1734 EMC 48%@1600 AVP 19%@12 NVDEC 716 MSENC 716 GR3D 70%@998 EDP limit 1734
______________________________________________________________________________________________

FPS: 28.8436
FPS: 30.0348
FPS: 29.8543
FPS: 29.981
FPS: 30.0487
FPS: 29.9896
FPS: 29.9014
FPS: 29.9272
FPS: 29.9179
FPS: 29.9197
FPS: 29.9959
FPS: 29.874
FPS: 29.9642
FPS: 29.9941
FPS: 29.9735
FPS: 30.0379
FPS: 29.8355
FPS: 30.2394
FPS: 29.9546

This are the results WITH stress test:

RAM 1394/3995MB (lfb 117x4MB) cpu [100%,100%,100%,100%]@1734 EMC 0%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1394/3995MB (lfb 117x4MB) cpu [100%,100%,100%,100%]@1734 EMC 0%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1394/3995MB (lfb 117x4MB) cpu [100%,100%,100%,100%]@1734 EMC 0%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1431/3995MB (lfb 117x4MB) cpu [100%,100%,100%,100%]@1734 EMC 0%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 2%@998 EDP limit 1734
RAM 1567/3995MB (lfb 117x4MB) cpu [100%,100%,100%,89%]@1734 EMC 2%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1708/3995MB (lfb 115x4MB) cpu [100%,100%,100%,87%]@1734 EMC 3%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1883/3995MB (lfb 113x4MB) cpu [100%,100%,100%,100%]@1734 EMC 6%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 2167/3995MB (lfb 113x4MB) cpu [98%,100%,100%,95%]@1734 EMC 25%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 48%@998 EDP limit 1734
RAM 2343/3995MB (lfb 106x4MB) cpu [100%,100%,100%,100%]@1734 EMC 41%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 13%@998 EDP limit 1734
RAM 2344/3995MB (lfb 106x4MB) cpu [100%,100%,100%,100%]@1734 EMC 45%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 16%@998 EDP limit 1734
RAM 2344/3995MB (lfb 106x4MB) cpu [100%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 35%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 26%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 20%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 5%@1600 AVP 0%@12 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 22%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 47%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 39%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 83%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [99%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 80%@998 EDP limit 1734
RAM 2343/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 41%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 20%@998 EDP limit 1734
RAM 2355/3995MB (lfb 105x4MB) cpu [99%,100%,100%,100%]@1734 EMC 43%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 15%@998 EDP limit 1734
RAM 2357/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 42%@998 EDP limit 1734
RAM 2344/3995MB (lfb 105x4MB) cpu [100%,100%,100%,100%]@1734 EMC 48%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 30%@998 EDP limit 1734
RAM 2363/3995MB (lfb 103x4MB) cpu [100%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 77%@998 EDP limit 1734
RAM 2360/3995MB (lfb 103x4MB) cpu [100%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 75%@998 EDP limit 1734
RAM 2344/3995MB (lfb 103x4MB) cpu [100%,99%,100%,100%]@1734 EMC 48%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 71%@998 EDP limit 1734
RAM 2344/3995MB (lfb 103x4MB) cpu [100%,100%,100%,100%]@1734 EMC 45%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 7%@998 EDP limit 1734
RAM 2345/3995MB (lfb 103x4MB) cpu [100%,100%,100%,100%]@1734 EMC 46%@1600 AVP 0%@12 NVDEC 716 MSENC 716 GR3D 81%@998 EDP limit 1734
_______________________________________________________________________
FPS: 32.9517
FPS: 32.8567
FPS: 41.0006
FPS: 29.8882
FPS: 34.2611
FPS: 39.5001
FPS: 37.1921
FPS: 36.4874
FPS: 31.7472
FPS: 33.3305
FPS: 32.7478
FPS: 31.4854
FPS: 29.947
FPS: 38.3856
FPS: 30.8186
FPS: 31.6353

The jumping framerates during stress test comes from the time measurement I used to artificially reduce the framerate to 30. I think under such heavy CPU load it is not very accurate.

However, it shows that the stitching has no problem with high CPU load

Hi crossfire
If it’s relative with the memory bandwidth, suppose the zero copy provided by Aasta should fixed this problem.