I’ve been trying to get multi-camera capture working using the Nano’s ISP functionality. Our test system is comprised of four 1MP RGGB bayer type cameras in 10 bit mode streaming at 30fps.
Until now I have been using the normal V4L driver, capturing RAW 10 images and processing them with custom written cuda kernels. This setup achieves good performance (low cpu usage)
Note: in order not to clutter this post I have taken one representative output of tegrastats which represents the mean usage closely
RAM 1354/3956MB (lfb 426x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU[15%@1479,5%@1479,3%@1479,6%@1479] EMC_FREQ 7%@1600 GR3D_FREQ 61%@537 VIC_FREQ 0%@140 APE 25 PLL@26.5C CPU@28.5C PMIC@100C GPU@25C AO@35C thermal@27C POM_5V_IN 2938/3146 POM_5V_GPU 442/498 POM_5V_CPU 442/546
For using functionality like auto exposure, debayering, denoising, etc… we were planning to use argus.
Running the argus-camera sample application and selection multiSession with 4 cameras uses half of all CPU resources
RAM 1608/3956MB (lfb 418x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [49%@1479,70%@1479,48%@1479,50%@1479] EMC_FREQ 17%@1600 GR3D_FREQ 32%@230 VIC_FREQ 31%@140 APE 25 PLL@30.5C CPU@32.5C PMIC@100C GPU@28.5C AO@38.5C email@example.comC POM_5V_IN 4340/4313 POM_5V_GPU 200/186 POM_5V_CPU 1446/1392
I have tried everything I could think of:
- Using cuEGLStream consumer
- Using egl renderer consumer
- using Buffer streams instead of EGL streams
- using single session with multiple streams
- using multi session with single stream
All end up using about 50% of all 4 CPUs
Because our original bare V4L path was working well, I thought to try the new V4L2 argus path exposed in Jetpack 4.5 but with limited success and the following issues:
- The YUV capture format is only available when opening device through v4l2_open calls, not the regular fd open calls. This is not a problem.
- However , opening a second camera using v4l2_open call does not work, it looks like something is hogging resources when opening the first camera. Even when the O_NONBLOCK flag is passed to v4l2_open, the following error is returned
,Opening in BLOCKING MODE
(NvCameraUtils) Error InvalidState: Mutex already initialized (in Mutex.cpp, function initialize(), line 41)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function open(), line 54)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function openSocketConnection(), line 258)
(Argus) Error InvalidState: Cannot create camera provider (in src/rpc/socket/client/SocketClientDispatch.cpp, function createCameraProvider(), line 102)
ArgusV4L2_Open failed: Invalid argument
Opening in BLOCKING MODE
1614068772:210:540 Error : Device not support V4L2_CAP_VIDEO_CAPTURE_MPLANE
CPU usage for just a single camera is already twice that of our original v4l app using 4 cameras
RAM 1565/3956MB (lfb 385x4MB) SWAP 0/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) CPU [20%@1479,17%@1479,20%@1479,20%@1479] EMC_FREQ 7%@1600 GR3D_FREQ 31%@153 VIC_FREQ 0%@140 APE 25 PLL@28.5C CPU@30C PMIC@100C GPU@27C AO@37C firstname.lastname@example.orgC POM_5V_IN 3019/3099 POM_5V_GPU 120/133 POM_5V_CPU 724/77
Another strange thing is that with the argus V4L pipeline the timestamp in buffers are in milliseconds and start at 0 for the first frame (so not system time which makes it hard to sync with other types of sensors) and the sequence member of the v4l buffer does not increment
One more thing in general. I have noticed that in a single session capture with multiple streams, the metadata returned for all streams is a duplication of the first stream. Even when output stream is set to TYPE_BUFFER and buffers are captured at the consumers, the EGL image data inside the buffers is different but when getting the Ibuffer interface and querying its metadata, it appears that you always get the metadata of the first stream although that buffer belongs to another stream. The only way I have found around this, is using muliSession capture.
There have been posts around dealing with high CPU usage on ISP for Jetson TX2/XAVIER. But since TX1/Nano have a different ISP pipeline I posted this in the Jetson Nano category.
Can somebody please verify that 50% usage on all CPU cores for capturing four 1MP streams is what can be expected from Nano ISP performance?