Hello all,
We have noticed a performance degradation issue when moving from an L4T 32.5 installation (ubuntu 18.04 with kernel 4.9) to L4T 35.5 (ubuntu 20.04, kernel 5.10) on our jetson AGX Xavier based industrial PC, which is also reproducible on a jetson AGX Xavier devkit (freshly installed)
We noticed that the scheduling of parallel execution tasks is worse on the newer release in a remarkable amount (2x to 7x depending on the load factor), and since we run our devices on the edge of execution performance, this largely affects our processes.
In order to prove our point, we created a small example openmp-test.cpp (741 Bytes)
that links to the openmp
library and spins 100 threads performing basic copy operations, and compared it’s performance on the aforementioned installations flashed on the same devkit, and here are the results we got
Ubuntu 18.04: 150ms
Ubuntu 20.04: 460ms
Note that these tests are made on the xavier devkit with a fresh installation and no other “significant” load on the device
In practice, on our industrial machine and with our computation demanding application running on the device, this difference changes to the following numbers:
Ubuntu 18.04 (with running app): 150ms
Ubuntu 20.04 (without running app): 270ms
Ubuntu 20.04 (with running app): 850ms
So we can see clearly that there is something to do with the scheduling of the CPU executions
We tried some parametrizations (ulimits, sysctl modifications) to reduce this difference, but we weren’t able to find any solution for the problem
This is why we are wondering if this issue has been flagged before by any other users of the product, and whether you can guide us to the configurations we can make to move towards the same performance we had with L4T 32
Note: The file can be compiled and linked against openmp using clang++ openmp-test.cpp -fopenmp
Thank you for your support