Low frame rate with FLIR camera on TX2 when using cudafilters library from OpenCV

Hello,
I have a minimal program capturing images from a USB3 Camera from FLIR. When only capturing images, I can obtain at least 30fps on a TX2 and TX1 platform on dev kit. If I instantiate a cuda filter from OpenCV 3.1 or 3.3 (does not matter), I can not get 30 fps on the TX2 while it’s possible on the TX1. I’m using the JetPack 3.1, I compiled OpenCV in release for 5.3 and 6.2 compute capability, I used the jetson_clock.sh script to boost the clocks. Any idea of why I can this difference in performance ?
Thanks.

Hi,

For maximize TX2 performance, please try following commands:

sudo ./jetson_clocks.sh
sudo nvpmodel -m 0

By the way, there is a CPU-based format covert in OpenCV.
For better performance, it’s recommended to use VisionWorks or CUDA direct processing to avoid CPU-based converter.
https://devtalk.nvidia.com/default/topic/1010111/jetson-tx1/nvmm-memory/post/5162114/#5162114

Just for capturing images from the camera (so without cuda filters from opencv), I can get 50fps after using sudo ./jetson_clocks.sh only. If I do sudo ./jetson_clocks.sh + sudo nvpmodel -m 0, then I can not get 50fps.
But still, I can not get 30fps or more while using cuda filters from OpenCV. Is there something about the USB3 driver of the TX2 that is different from the TX1? Does cuda filters from opencv make an inefficient use of the memory ?

Hi,

Thanks for your feedback.

There is something need to be clarified for TX2 maximum performance.
We are checking this issue. Will update information to you later.

Hi,

We found that nvpmodel will reset CPU/GPU back to default.
This setting will cause poor performance. We are still clarifying the cause.

Current, please run the following commands in sequence as WAR.

sudo nvpmodel -m 0       #This will enable two Denver CPU
sudo ./jetson_clock.sh   #This will maximize CPU/GPU performance

Thanks.

But just running sudo ./jetson_clock.sh and then querying the power mode with sudo nvpmodel -q I noticed that the jetson clock script is already putting the tx2 in max N mode (mode 0).

I did further experiments to find the cause of slow frame rate by slashing in cudafilters library to reduce its size. From OpenCV 3.1:
-I removed all functions from cuda filters except the separable linear filters.
-I removed the templates to generate filters for only 32fc1 image type.
-I only kept reflect101 border type.

OpenCV was compiled in release, cuda fast math, and for architecture 6.2 (TX2). In this case, I was able to capture at 50fps, with a max of 52fps. Still I don’t know why it’s working out of the box on TX1. I suspect this has to do with memory usage (constant, shared and so on) or some kind of global variable. But this is getting too obscure for me. If some luminaries from NVidia or memory guru could cast some light on this discrepency, that would make my day since I don’t like to be sitting on a potential time ticking bomb.

Hi,

nvpmodel will remember the last time setting. It may not be up-to-date if you have rebooted the device.

Which JetPack version do you use?
Could your problem be reproduced via standard onboard camera or USB-camera?
We want to reproduce this issue on our side, could you provide source code for us doing this?

Thanks.

Hi,

We have clarified the nvpmodel issue. No bug but only some misunderstanding.

nvpmodel -m 0

  • Raise max_freq to 1300(hw-max), set min_freq to 114(hw-min)
  • Curr_freq will between 114 - 1300

./jetson_clocks.sh

  • Fix freq to 1300 by overwriting min_freq = max_frwq = 1300
  • Curr_freq always equal to 1300

nvpmodel set the valid max/min clock to the hardware max/min.
jetson_clock always fix the clock to the max.

That’s why curr_freq will drop to 114 once the nvpmodel is set. (Since min_freq is reset to hw-min).

Thanks for the clarification about nvpmodel.

I use JetPack 3.1. So far, I only tested with a FLIR camera. I posted an example code on GitHub at: GitHub - frmir/flir-capture
I don’t have any other USB camera for now.

Thanks. We will check this issue and update information to you.

Hi,

We try to reproduce this error but meet errors.
There are some dependency of “flycapture/FlyCapture2.h” but this file is not available.

Could you help us check this issue?

To use this example, you need a FLIR camera. It comes with a library (named flycapture) to capture images. This library is downloadable from FLIR. Since you may not have a FLIR USB3 camera, you can try to use an other USB3 camera and see if the problem is reproducible.

I did further tests. Using the example I provided and opencv 3.3.0, the frame rate of the camera becomes low only after stressing the platform beforehand. For example, do a make clean, make -j6 after a ./jetson_clocks.sh on one of your project, wait for everything to be compiled, then start the example. If the capture is done just after a reboot and jetson_clocks, the frame rate is fine.

Hi,

Could you share following device information in both cases?

sudo ./tegrastats
sudo nvpmodel -q --verbose

Thanks.

Here are the stats after a clean reboot, jetson_clocks.sh and running the example. In this case, I get the requested fps from the camera:

sudo ./tegrastats

RAM 1505/7851MB (lfb 1402x4MB) cpu [34%@2034,9%@2013,100%@2034,26%@2034,30%@2034,31%@2035] EMC 13%@1866 APE 150 GR3D 5%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [31%@2032,58%@2034,52%@2034,28%@2032,28%@2032,36%@2033] EMC 13%@1866 APE 150 GR3D 5%@1300
RAM 1506/7851MB (lfb 1402x4MB) cpu [27%@2033,32%@2035,77%@2036,29%@2035,26%@2032,23%@2033] EMC 13%@1866 APE 150 GR3D 0%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [31%@2035,9%@2034,100%@2036,26%@2034,29%@2032,39%@2035] EMC 13%@1866 APE 150 GR3D 6%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [23%@2032,84%@2035,31%@2034,19%@2033,33%@2033,29%@2034] EMC 13%@1866 APE 150 GR3D 5%@1300
RAM 1508/7851MB (lfb 1402x4MB) cpu [19%@2033,48%@2035,70%@2034,25%@2035,27%@2033,22%@2035] EMC 13%@1866 APE 150 GR3D 5%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [40%@2035,33%@2035,64%@2034,29%@2033,31%@2033,33%@2033] EMC 13%@1866 APE 150 GR3D 6%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [23%@2033,49%@2035,65%@2035,21%@2033,26%@2034,34%@2036] EMC 13%@1866 APE 150 GR3D 6%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [36%@2034,31%@2035,60%@2035,34%@2032,30%@2031,32%@2030] EMC 13%@1866 APE 150 GR3D 4%@1300
RAM 1507/7851MB (lfb 1402x4MB) cpu [30%@2034,58%@2035,54%@2035,30%@2034,20%@2033,27%@2033] EMC 13%@1866 APE 150 GR3D 0%@1300

sudo nvpmodel -q --verbose

NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: Current mode: NV Power Mode: MAXN
0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_1: PATH /sys/devices/system/cpu/cpu1/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_2: PATH /sys/devices/system/cpu/cpu2/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_3: PATH /sys/devices/system/cpu/cpu3/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_4: PATH /sys/devices/system/cpu/cpu4/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_5: PATH /sys/devices/system/cpu/cpu5/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_A57: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq: REAL_VAL: 2035200 CONF_VAL: 0
NVPM VERB: PARAM CPU_A57: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM CPU_DENVER: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq: REAL_VAL: 2035200 CONF_VAL: 0
NVPM VERB: PARAM CPU_DENVER: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM GPU: ARG MIN_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq: REAL_VAL: 1300500000 CONF_VAL: 0
NVPM VERB: PARAM GPU: ARG MAX_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq: REAL_VAL: 1300500000 CONF_VAL: 2147483647
NVPM VERB: PARAM EMC: ARG MAX_FREQ: PATH /sys/kernel/nvpmodel_emc_cap/emc_iso_cap: REAL_VAL: 0 CONF_VAL: 0

Then I just do a make clean make -j6 on one of my project to stress the platform, I wait for it to be done. And then I run the example. In this case, I get a lower fps than requested:

sudo ./tegrastats

RAM 1537/7851MB (lfb 989x4MB) cpu [31%@2034,4%@2035,2%@2035,20%@2034,32%@2033,37%@2035] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1536/7851MB (lfb 989x4MB) cpu [30%@2035,3%@2036,3%@2035,34%@2035,32%@2033,27%@2035] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1538/7851MB (lfb 989x4MB) cpu [34%@2035,2%@2035,2%@2087,32%@2034,30%@2034,28%@2035] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1536/7851MB (lfb 989x4MB) cpu [30%@2033,4%@2034,2%@2035,31%@2034,31%@2033,30%@2034] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1536/7851MB (lfb 989x4MB) cpu [37%@2033,3%@2036,3%@2035,26%@2036,34%@2036,26%@2035] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1536/7851MB (lfb 989x4MB) cpu [35%@2031,4%@2034,4%@2035,22%@2033,32%@2034,34%@2033] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1538/7851MB (lfb 989x4MB) cpu [24%@2034,3%@2036,3%@2036,31%@2036,28%@2035,37%@2034] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1538/7851MB (lfb 989x4MB) cpu [30%@2034,3%@2036,3%@2036,32%@2033,28%@2035,33%@2036] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1537/7851MB (lfb 989x4MB) cpu [33%@2032,3%@2035,3%@2034,26%@2033,38%@2034,26%@2032] EMC 11%@1866 APE 150 GR3D 6%@1300
RAM 1535/7851MB (lfb 989x4MB) cpu [40%@2036,3%@2035,3%@2036,36%@2034,22%@2035,27%@2037] EMC 11%@1866 APE 150 GR3D 1%@1300

sudo ./tegrastats

NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: Current mode: NV Power Mode: MAXN
0
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_1: PATH /sys/devices/system/cpu/cpu1/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_2: PATH /sys/devices/system/cpu/cpu2/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_3: PATH /sys/devices/system/cpu/cpu3/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_4: PATH /sys/devices/system/cpu/cpu4/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_ONLINE: ARG CORE_5: PATH /sys/devices/system/cpu/cpu5/online: REAL_VAL: 1 CONF_VAL: 1
NVPM VERB: PARAM CPU_A57: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq: REAL_VAL: 2035200 CONF_VAL: 0
NVPM VERB: PARAM CPU_A57: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM CPU_DENVER: ARG MIN_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq: REAL_VAL: 2035200 CONF_VAL: 0
NVPM VERB: PARAM CPU_DENVER: ARG MAX_FREQ: PATH /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq: REAL_VAL: 2035200 CONF_VAL: 2147483647
NVPM VERB: PARAM GPU: ARG MIN_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/min_freq: REAL_VAL: 1300500000 CONF_VAL: 0
NVPM VERB: PARAM GPU: ARG MAX_FREQ: PATH /sys/devices/17000000.gp10b/devfreq/17000000.gp10b/max_freq: REAL_VAL: 1300500000 CONF_VAL: 2147483647
NVPM VERB: PARAM EMC: ARG MAX_FREQ: PATH /sys/kernel/nvpmodel_emc_cap/emc_iso_cap: REAL_VAL: 0 CONF_VAL: 0

Hi,

It looks like the difference is from two Denver cores (CPU1 and CPU2).
Could you help us check the behavior of ‘make -j4’ or '‘make’?
Please share the tegrastats results.

Thanks.

ok. Here are the results I gathered.

First I rebooted. Then jetson_clocks.sh. After that, I made sure I get the requested 50fps by running the example. I get between 19 and 21ms for the image period (50fps), which is good.

Then, I do a make clean, make. Once done, I run the example. Requested 50fps but I get between 22 and 24ms for the image period, so 42 to 45fps.

RAM 1487/7851MB (lfb 1355x4MB) cpu [30%@2001,4%@2034,5%@2034,31%@2008,28%@2012,32%@2008] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1485/7851MB (lfb 1355x4MB) cpu [29%@2034,5%@2036,4%@2012,29%@2034,33%@2035,30%@2034] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1485/7851MB (lfb 1355x4MB) cpu [25%@2031,5%@2036,4%@2035,30%@2035,28%@2034,38%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1486/7851MB (lfb 1355x4MB) cpu [27%@2035,3%@2035,4%@2035,38%@2035,29%@2034,32%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1488/7851MB (lfb 1355x4MB) cpu [30%@2034,5%@2035,3%@2035,28%@2035,32%@2035,25%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1485/7851MB (lfb 1355x4MB) cpu [27%@2031,6%@2034,3%@2034,34%@2034,27%@2033,38%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1487/7851MB (lfb 1355x4MB) cpu [34%@2034,5%@2034,4%@2036,28%@2035,32%@2033,25%@2034] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1486/7851MB (lfb 1355x4MB) cpu [36%@2033,4%@2036,6%@2035,33%@2032,25%@2032,30%@2033] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1485/7851MB (lfb 1355x4MB) cpu [35%@2032,5%@2035,3%@2035,29%@2033,31%@2034,29%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1487/7851MB (lfb 1355x4MB) cpu [34%@2034,4%@2035,3%@2035,27%@2035,32%@2033,29%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300

After make clean, make -j4, I request 50fps but get between 26 and 28ms for image period, so 35 to 38fps.

RAM 1589/7851MB (lfb 1110x4MB) cpu [40%@2033,3%@2035,3%@2037,33%@2035,26%@2034,29%@2033] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1589/7851MB (lfb 1110x4MB) cpu [33%@2032,4%@2036,3%@2034,29%@2036,33%@2031,27%@2034] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1589/7851MB (lfb 1110x4MB) cpu [31%@2032,5%@2035,4%@2036,33%@2034,27%@2034,32%@2032] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1590/7851MB (lfb 1110x4MB) cpu [27%@2036,4%@2036,4%@2035,35%@2035,32%@2034,33%@2034] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1591/7851MB (lfb 1110x4MB) cpu [26%@2034,4%@2035,3%@2035,33%@2034,26%@2035,35%@2034] EMC 12%@1866 APE 150 GR3D 5%@1300
RAM 1591/7851MB (lfb 1110x4MB) cpu [35%@2036,5%@2035,3%@2036,33%@2034,27%@2024,35%@2034] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1589/7851MB (lfb 1110x4MB) cpu [34%@2033,3%@2035,3%@2035,35%@2034,34%@2033,29%@2034] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1588/7851MB (lfb 1110x4MB) cpu [26%@2030,4%@2034,2%@2035,30%@2031,35%@2033,29%@2033] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1588/7851MB (lfb 1110x4MB) cpu [32%@2033,4%@2035,3%@2036,34%@2032,33%@2032,27%@2034] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1590/7851MB (lfb 1110x4MB) cpu [28%@2034,4%@2035,3%@2036,32%@2034,31%@2033,30%@2035] EMC 12%@1866 APE 150 GR3D 6%@1300

Finally, after make clean, make -j6, I request 50fps again but get between 26 and 28ms for image period again, so 35 to 38fps.

RAM 1591/7851MB (lfb 985x4MB) cpu [39%@2032,4%@2036,3%@2037,28%@2035,32%@2035,26%@2034] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1592/7851MB (lfb 985x4MB) cpu [34%@2032,2%@2034,4%@2033,33%@2036,30%@2032,29%@2034] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1590/7851MB (lfb 985x4MB) cpu [29%@2031,5%@2035,3%@2036,28%@2035,35%@2034,30%@2030] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1591/7851MB (lfb 985x4MB) cpu [38%@2033,4%@2036,3%@2036,34%@2034,28%@2036,26%@2033] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1594/7851MB (lfb 985x4MB) cpu [29%@2034,3%@2036,3%@2035,32%@2033,28%@2033,31%@2036] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1593/7851MB (lfb 985x4MB) cpu [37%@2033,4%@2035,4%@2035,26%@2035,30%@2034,33%@2032] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1591/7851MB (lfb 985x4MB) cpu [37%@2033,3%@2036,2%@2034,28%@2035,34%@2032,26%@2033] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1592/7851MB (lfb 985x4MB) cpu [34%@2032,5%@2035,5%@2035,31%@2035,27%@2033,34%@2033] EMC 12%@1866 APE 150 GR3D 6%@1300
RAM 1591/7851MB (lfb 985x4MB) cpu [27%@2032,3%@2036,2%@2035,39%@2033,30%@2033,31%@2033] EMC 12%@1866 APE 150 GR3D 0%@1300
RAM 1591/7851MB (lfb 985x4MB) cpu [41%@2034,4%@2035,4%@2036,22%@2034,38%@2035,27%@2034] EMC 12%@1866 APE 150 GR3D 0%@1300

Hi,

We want to clarify this is a camera or a GPU issue first.
Could you help us modify the input into image(maybe cv::imread(…)) and reproduce it again?

Thanks.

Hi,
I modified the example to get images with the VideoCapture class from opencv. Images are 868x600 pixels. After a reboot and applying jetson_clocks.sh, I can process images at 111fps (9ms).

RAM 1354/7851MB (lfb 1454x4MB) cpu [34%@2033,0%@2034,3%@2035,53%@2033,18%@2036,14%@2034] EMC 12%@1866 APE 150 GR3D 12%@1300
RAM 1354/7851MB (lfb 1454x4MB) cpu [18%@2033,0%@2036,3%@2036,33%@2035,36%@2034,33%@2034] EMC 12%@1866 APE 150 GR3D 7%@1300
RAM 1354/7851MB (lfb 1453x4MB) cpu [8%@2034,0%@2035,3%@2035,39%@2034,15%@2033,57%@2035] EMC 12%@1866 APE 150 GR3D 3%@1300
RAM 1354/7851MB (lfb 1453x4MB) cpu [15%@2034,0%@2034,0%@2036,23%@2035,28%@2036,51%@2035] EMC 12%@1866 APE 150 GR3D 12%@1300
RAM 1354/7851MB (lfb 1453x4MB) cpu [48%@2034,0%@2034,0%@2035,32%@2038,11%@2036,27%@2034] EMC 12%@1866 APE 150 GR3D 7%@1300

After make clean, make -j6 on a big project of mine, I run the example and can process images at the same rate of 111fps (9ms), so I did not notice a difference:

RAM 1478/7851MB (lfb 1001x4MB) cpu [16%@2036,0%@2036,0%@2036,22%@2035,50%@2034,28%@2034] EMC 12%@1866 APE 150 GR3D 7%@1300
RAM 1478/7851MB (lfb 1001x4MB) cpu [34%@2035,0%@2036,0%@2034,20%@2035,16%@2034,51%@2035] EMC 12%@1866 APE 150 GR3D 9%@1300
RAM 1478/7851MB (lfb 1001x4MB) cpu [32%@2035,0%@2034,0%@2035,59%@2035,14%@2034,12%@2033] EMC 12%@1866 APE 150 GR3D 3%@1300
RAM 1478/7851MB (lfb 1001x4MB) cpu [10%@2034,0%@2035,0%@2035,47%@2033,35%@2034,24%@2034] EMC 12%@1866 APE 150 GR3D 3%@1300
RAM 1478/7851MB (lfb 1001x4MB) cpu [22%@2035,0%@2034,0%@2034,53%@2035,23%@2035,21%@2035] EMC 12%@1866 APE 150 GR3D 5%@1300