CPU Cores occupy 100% On Xavier NX

Hello,

I wrote an application to grab raw images from three different video channels (3 threads)
(3 camera’s) simultaneously with max FPS (two camera’s at 144fps & third one at 226FPS)

After 1 hour, i observed strange behavior of system, 3 cores occupy 100% of CPU and other 3 CPU are fully free.

Everything works normal, after reboot of the NX system. It will not recover if i just restart my application.

htop:

Frequency:

jetson_clocks --show

SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MODE_15W_6CORE

# free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.5G        3.8G         19M        2.3G        5.3G
Swap:            0B          0B          0B

CPU Governor set to performance

What is happening ? how to debug & fix ?

Really appreciate your suggestions & support.

Could you check with v4l2-ctl to clarify if it relative with the APP?

Hi,

In our application, we copy each frame using memcpy() into a big buffer allocated by using calloc.

memcpy() is causing this issue, is there any better way to fix this issue ?

I think there’s no way to optimize memcpy in large size memory, try to use pointer to pass image.

memcpy causes the issue, but i’m quite surprise… only three CPU cores been used and remaining all are idle… why didn’t used other cores, if current cores are fully occupied. I didn’t set any CPU affinity. It’s just free run

What’s your use case really need memory copy instead of using mmap for better performance?

My use case is to grab the raw images at max fps using v4l2 API into a user-space big buffer to process by Algorithms.

I’m using IO_METHOD_MMAP method to grab raw images, allocated 16 Buffers.
As you know with this method, can’t mmap new buffer while grabbing images without stream stop.

Maybe you can reference to the multimedia API sample code like …/tegra_multimedia_api/samples/12_camera_v4l2_cuda or v4l2cuda for using mmap as post processing.