We have six csi cameras that we get images from using argus @ 30fps. We’ve noticed that just streaming the cameras by calling iFrameConsumer->acquireFrame() incurs a pretty high cpu load (~110% according to top). So it’s taking more than a core just to stream the cameras without any other processing. We’d love to reduce that if possible. Currently we are using JetPack 4.2.2 with argus in single process mode.
Based on profiling results, the biggest parts appear to be:
SCF_Execution. Looks like there are two SCF_Execution threads per camera (I see 12 of those threads). Those alone take up ~75%. According to the profiling something like ~10% of SCF_Execution is just powf() calls. I don’t have symbols to know what most of the other processing is doing, but it’s mostly all in libnvscf.so and libnvos.so. Is it possible some of this is image processing? or does that all happen on the ISP?
CaptureSchedule. This thread takes up ~20%. No useful symbols, but most of the time is again in libnvos.so and libnvscf.so.
After acquireFrame we are getting the IImageNativeBuffer and calling copyToNvBuffer to scale the image and save it into another nvbuffer. That also takes a good bit of cpu time (~2.5% for each camera) even though it uses the VIC. In the profiling results the cpu usage of copyToNvBuffer seems to break down into VicConfigure, VicCreateSession, VicExecute, VicFreeSession. Of that, only ~40% is in VicExecute. The rest is in the Configure, CreateSession, FreeSession so a lot of time seems to just be overhead.
Total time for SCF_Execution and CaptureSchedule seems to scale linearly with the fps. We can reduce the cpu usage by ~33% by running the cameras at 10fps instead of 30fps, but it isn’t a great solution. Is there anything else we could do to reduce the cpu load?
EDIT: if it matters, I’m using CAPTURE_INTENT_PREVIEW in createRequest and leaving everything else (edge enhancement mode, denoise mode, etc…) at defaults.
EDIT 2: after more testing it looks like the number of SCF_Execution threads isn’t dependent on the number of cameras being streamed. There are always 12 SCF_Execution threads regardless of number of cameras. The total CPU usage by argus (mainly in SCF_Execution and CaptureSchedule) scales mostly linearly as a function of total number of images per second (#_cameras * FPS).
EDIT 3: Argus creates a somewhat insane number of threads. From what I can tell, simply initializing Argus creates 25 threads. An additional ~15 are created per streamed camera. Most aren’t using much or any CPU and seem to be tied to different parts of the image pipeline - DeFogStage, GpuBlitStage, SharpenStage, VideoStabilization, HdfxStage, AoltmStage, etc… I’m really curious now exactly what the SCF_Execution and CaptureSchedule threads are doing to eat up so much CPU. Also, what processing happens on the ISP vs on the CPU or GPU and why so many threads are required.