Hello,
we are using the TX2 with R32.2.1 on a customer carrier board. The main application is to capture images from up to 6 image sensors (IMX290 1920x1080 30fps) connectd via CSI using the nvargus-daemon by the Multimedia API.
As we are now in the phase of optimizing the system it turned out that the nvargus-daemon consumes a lot of system performance.
As the camera images need to be further processed by OpenCV after ‘acquireFrame’ the yuv420 image is getting color converted (ARGB32) and lifted into cpu memory with the NvConverter.
The performance needed for one camera image is around 20% and scales with the number of processed images.
Using the NVConverter even introduces a non deterministic frame latency of 4~20ms!
%Cpu0 : 44.5 us, 46.5 sy, 0.0 ni, 5.7 id, 0.0 wa, 3.0 hi, 0.3 si, 0.0 st
%Cpu3 : 54.0 us, 40.6 sy, 0.0 ni, 5.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu4 : 55.1 us, 39.5 sy, 0.0 ni, 4.7 id, 0.0 wa, 0.7 hi, 0.0 si, 0.0 st
%Cpu5 : 57.0 us, 38.0 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
MiB Mem : 7871.1 total, 4465.4 free, 3200.1 used, 205.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5564.7 avail Mem
4607 root 20 0 2644120 488096 364456 R 125.2 6.1 2:11.44 handle_cams_4
3743 root 20 0 18.7g 534972 48812 S 121.6 6.6 6:29.43 nvargus-daemon
4728 root 20 0 1522856 268424 197172 R 81.4 3.3 0:15.38 handle_cams_2
Question:
1.) Is there a more efficient way of converting the captured images
2.) Is there a way around the double color conversion used for OpenCV (YUV420->ARGB32 ->RGB)
3.) What is the experience by implementing an own cuda color conversion kernel and what is the expected performance impact? Is this approach working for multi process applications?
And last: What exactly is the nvargus-daemon doing to waste such a huge amount of system performance?
Is it a soft ISP not running on a dedicated cpu but on the cpu’s intended for customer use?