Camera timeout at high CPU load

Hi,

in my main application, I use Argus to capture images from three synchronized cameras and process them using a VisionWorks pipeline. In this case, everything is working fine. As expected, the GPU load becomes high and the CPU load remains low.

Now, I want to run another unrelated application which utilize almost 100% of the CPU in parallel. After some time, my main application randomly stops working because of timeouts during image capture.

I get the following output in my main application:

SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 92)
SCF_AutocontrolACSync failed to wait for an earlier frame to complete.

SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
SCF: Error Timeout: Sending critical error event (in src/api/Session.cpp, function sendErrorEvent(), line 992)
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 689)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
SCF: Error InvalidState: Session has suffered a critical failure (in src/api/Session.cpp, function capture(), line 689)
(Argus) Error InvalidState:  (propagating from src/api/ScfCaptureThread.cpp, function run(), line 109)
...
(the last two lines are printed repeatedly)

It seems that the problem only occurres at high CPU load. I didn’t notice a similar behavior without running the second application.
Is this a bug in the camera stack? Can change something in the code to work around this issue? I would appreciate any help. Thanks!

Do be sure to run the “~ubuntu/jetson_clocks.sh” (or “~nvidia/jetson_clocks.sh”) for max performance.

Does your CPU consuming program depend on outside hardware I/O? If it does there may not be much you could do other than perhaps decreasing the priority of your process so the cameras get higher priority. Perhaps software-only portions could be in a different thread than I/O portions and have the hardware I/O portion as lean and fast as possible (then the hardware I/O would be a smaller time slice in competition with the cameras…you could perhaps use affinity for the other threads to migrate to other cores).

If the program does not depend on hardware I/O (e.g., it works with data already existing due to cameras fetching into that memory), then you might be able to mark your non-I/O CPU consuming process to not use CPU0 (CPU0 is where much of the hardware I/O is restricted to…any software can run on any core when there is no outside I/O requiring CPU0).

The second program does not depend on outside hardware I/O. One can simulate it also with stress, e.g.:

stress -c 6

.
I tried to adjust the nice value in order to increase the priority of the application which accesses the cameras. Unfortunately this did not help. Is there a mechanism to disallow other processes to use the CPU0?

CPU scheduling looks at an affinity mask (keep in mind a “mask” is the inverse applied over the top of “allowed”). One can assign (or more correctly, provide a strong suggestion) to use or not use given cores. There are multiple ways to access this, e.g., the program itself, or from outside settings (some of which would require root permissions). Then there may be a finer division based on threading instead of full processes.

See:

man sched_affinity
man pthread_setaffinity_np

You have access to “soft” affinity…and if allowed, this will result in the same “hard” affinity. If one divides a driver up correctly, then the smallest amount of work will be done with hardware I/O, and then the rest migrated to another CPU core (ksoftirqd). You don’t have control over other people’s driver designs, but many of the drivers used at the core of the system are well-optimized. You do have access to priorities of PIDs…be careful with adjust other people’s PIDs or increasing your PID’s priority (too negative of a “nice” on your process could deadlock the system…it’d be rare that more negative than “-2” would improve something without a risk of destroying something you never intended to change).

If you run this you’ll see affinity is related to IRQ (it is the IRQ which triggers the running of a driver or other code):

find /proc -name "*affinity*"

You should always see “/proc/irq/1/”. If you “cat” the “affinity_hint” file you will see a hexadecimal mask with “0x00” not removing any core from the list (and it is a strong emphasis on “_hint” for most cases). In fact if you run this you’ll likely see the default that any IRQ can run on any core:

cat `find /proc/irq -name "affinity_hint"`

See:
https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
https://www.linuxjournal.com/article/6799

You might find this more interesting (the “-c” counts lines of occurrence):

cat `find . -name 'smp_affinity_list'` | sort | uniq -c

…these correspond to cores. Most are “0-5” (all 6 cores being available). The ones corresponding to just “0” are likely drivers doing hardware I/O.

You’ll probably ask how to map an IRQ to a PID, but I think this cannot be done…the listing of IRQ “/proc” files above is for illustration. However, you can sort of do this via:

cat /proc/interrupts

If you do this, then you’ll see a very large number of interrupts on CPU0 since this is the only core with wiring to many of the controllers you see performing interrupts there. If your application can run without access to those hardware controllers, then it is a good candidate for masking to disallow CPU0. If your program does both I/O through hardware, and also does software computation without requiring hardware controller I/O, then you would be advised to split the program into two threads and set affinity of the software portion to disallow CPU0. If the process does mainly hardware I/O, then you will end up on CPU0 no matter what affinity you set (you could tell it to go to CPU1, but it would migrate back to CPU0). This isn’t always going to do what you want since cache plays a part in performance and splitting cores can result in cache misses. On the other hand, since CPU0 is a very critical resource, then the reduced time on this core can be a very important benefit even if a given program or driver is slightly slower as a result.

I don’t know if it is purely part of the ARM architecture or if it is specific to this particular SoC, but if there were a mechanism to distribute interrupts over all cores, then this same SoC would have a massive performance boost. Many IRQs would have very slightly increased latency, but the overall system would become much better behaved even under the heaviest of loads. Intel x86_64 cores have support in CPU working with the IO APIC to do this…this SoC has no such support, it simply has an aggregator which sends much of the work only to CPU0.