Jetson Nano ISP functionality performance

Hi all,

I’ve been trying to get multi-camera capture working using the Nano’s ISP functionality. Our test system is comprised of four 1MP RGGB bayer type cameras in 10 bit mode streaming at 30fps.

Until now I have been using the normal V4L driver, capturing RAW 10 images and processing them with custom written cuda kernels. This setup achieves good performance (low cpu usage)
Note: in order not to clutter this post I have taken one representative output of tegrastats which represents the mean usage closely

RAM 1354/3956MB (lfb 426x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU[15%@1479,5%@1479,3%@1479,6%@1479] EMC_FREQ 7%@1600 GR3D_FREQ 61%@537 VIC_FREQ 0%@140 APE 25 PLL@26.5C CPU@28.5C PMIC@100C GPU@25C AO@35C thermal@27C POM_5V_IN 2938/3146 POM_5V_GPU 442/498 POM_5V_CPU 442/546

For using functionality like auto exposure, debayering, denoising, etc… we were planning to use argus.
Running the argus-camera sample application and selection multiSession with 4 cameras uses half of all CPU resources

RAM 1608/3956MB (lfb 418x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [49%@1479,70%@1479,48%@1479,50%@1479] EMC_FREQ 17%@1600 GR3D_FREQ 32%@230 VIC_FREQ 31%@140 APE 25 PLL@30.5C CPU@32.5C PMIC@100C GPU@28.5C AO@38.5C thermal@30.25C POM_5V_IN 4340/4313 POM_5V_GPU 200/186 POM_5V_CPU 1446/1392

I have tried everything I could think of:

  • Using cuEGLStream consumer
  • Using egl renderer consumer
  • using Buffer streams instead of EGL streams
  • using single session with multiple streams
  • using multi session with single stream

All end up using about 50% of all 4 CPUs

Because our original bare V4L path was working well, I thought to try the new V4L2 argus path exposed in Jetpack 4.5 but with limited success and the following issues:

  • The YUV capture format is only available when opening device through v4l2_open calls, not the regular fd open calls. This is not a problem.
  • However , opening a second camera using v4l2_open call does not work, it looks like something is hogging resources when opening the first camera. Even when the O_NONBLOCK flag is passed to v4l2_open, the following error is returned

v4l2_open(/dev/video1, 00000002)
,Opening in BLOCKING MODE
(NvCameraUtils) Error InvalidState: Mutex already initialized (in Mutex.cpp, function initialize(), line 41)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function open(), line 54)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function openSocketConnection(), line 258)
(Argus) Error InvalidState: Cannot create camera provider (in src/rpc/socket/client/SocketClientDispatch.cpp, function createCameraProvider(), line 102)
ArgusV4L2_Open failed: Invalid argument
Opening in BLOCKING MODE
1614068772:210:540 Error : Device not support V4L2_CAP_VIDEO_CAPTURE_MPLANE

  • CPU usage for just a single camera is already twice that of our original v4l app using 4 cameras
    RAM 1565/3956MB (lfb 385x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [20%@1479,17%@1479,20%@1479,20%@1479] EMC_FREQ 7%@1600 GR3D_FREQ 31%@153 VIC_FREQ 0%@140 APE 25 PLL@28.5C CPU@30C PMIC@100C GPU@27C AO@37C thermal@28.75C POM_5V_IN 3019/3099 POM_5V_GPU 120/133 POM_5V_CPU 724/77

  • Another strange thing is that with the argus V4L pipeline the timestamp in buffers are in milliseconds and start at 0 for the first frame (so not system time which makes it hard to sync with other types of sensors) and the sequence member of the v4l buffer does not increment

One more thing in general. I have noticed that in a single session capture with multiple streams, the metadata returned for all streams is a duplication of the first stream. Even when output stream is set to TYPE_BUFFER and buffers are captured at the consumers, the EGL image data inside the buffers is different but when getting the Ibuffer interface and querying its metadata, it appears that you always get the metadata of the first stream although that buffer belongs to another stream. The only way I have found around this, is using muliSession capture.

There have been posts around dealing with high CPU usage on ISP for Jetson TX2/XAVIER. But since TX1/Nano have a different ISP pipeline I posted this in the Jetson Nano category.

Can somebody please verify that 50% usage on all CPU cores for capturing four 1MP streams is what can be expected from Nano ISP performance?

Best regards,
Nico

hello Nico,

it’s correct that single session will duplicate the first stream settings to others. it usually the use-case of frame synchronization for single session per multi-stream;
hence, you should have multi-session if you’re going to execute sensor operation individually.

regarding to CPU usage for multiple camera use-case, due to Argus camera using EGL streams for rendering the frame to display, it’ll consume more CPU usage than standard v4l utility.
tegrastas shows the instance CPU usage, you may boost the CPU clocks to maximum to collect correct usage reports.

besides using tegrastas,
you may enable top commands, it’s by default in Solaris mode which expressed as a percentage of total CPU time; you may toggle ‘Irix/Solaris’ modes with the ‘I’ interactive command to enter Irix mode to check average CPU usage.
thanks

Hi @JerryChang,

Thanks for the follow up. If using multiSession is required to get different timestamps in a frame-synchronized camera setup, that’s not a problem for me if the performance is good.

I have adapted the syncSensor argus sample to perform multiSession capturing and removed all processing (main.cpp (16.6 KB) ). It is just acquiring frames and releasing them again. There is no creation of cuda object surfaces, no histogram computation nor any rendering being performed. Although this reduced the CPU usage to some degree, the difference between our original RAW10 +cuda processing pipeline and the argus pipeline remains huge.

Here are my timing result with Irix mode ON/OFF that show that, regardless of Irix mode, the argus pipeline consumes between 6x to 7x times the amount of CPU, even with NV Power Mode: MAXN

Custom V4L pipeline Irix mode ON

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12982 nico 20 0 8585216 101352 91064 S 12.5 2.5 0:16.47 camera_recorder

Custom V4L pipeline Irix mode OFF

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12982 nico 20 0 8585216 101352 91064 S 3.1 2.5 0:28.57 camera_recorder

Argus pipeline Irix mode ON

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4779 root 20 0 10.697g 485940 38508 S 78.3 12.0 3:53.09 nvargus-daemon
13235 nico 20 0 9007432 40564 25064 S 10.3 1.0 0:01.60 argus_syncsenso

Argus pipeline Irix mode OFF

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4779 root 20 0 10.698g 484688 38824 S 19.7 12.0 3:45.42 nvargus-daemon
13235 nico 20 0 9007432 40564 25064 S 2.6 1.0 0:00.59 argus_syncsenso

Best regards,
Nico

hello Nico,

please configure as Irix mode OFF to determine actual CPU usage.
Argus do consume more CPU usage than standard v4l utility as I mentioned previously.

may I know…
how many cameras are you going to enabled for your use-case?
what’s your criteria for the CPU usage, is there any issues with your use-case?
thanks