Real-Time performance on DRIVE AGX

Hello,

We have recently received DRIVE AGX Xavier Developer Kit.

Concurrent Real-Time develops an RTOS called “RedHawk” which we are trying to run on the DRIVE AGX we have received. We have successfully released RedHawk as product on all of the Jetson platforms, including Jetson Xavier. On all these platforms, we have been able to get real-time determinism and a very low latency of less than 50us.

https://www.concurrent-rt.com/products/redhawk-linux/

However, during our initial testing on the DRIVE AGX platform, we have observed that the maximum latency ranges between 400-500us for (at least) a 24 hr run. But with some modifications that we have done with Jetson Xavier and RedHawk-specific tuning, we have been able to get much better maximum latency, though not what we have been able to get on the Jetson Xavier.

I understand that there’s a Hypervisor involved on the DRIVE platform whereas Jetson Xavier was a bare metal yet we expect better real-time numbers on DRIVE AGX than what we are seeing currently.

In order to get the real-time latency numbers, we are running a standard open-source application called “cyclictest(8)” (part of rt-tests package) on a “shielded” processor. Shielding is a RedHawk-specific feature which can protect the CPU(s) from some set of system activity providing better determinism which running the application code. We also run stress(1) on the system to try to achieve a worst-case scenario.

We have run similar tests on the PREEMPT_RT kernel that gets flashed through the SDKManager and we have seen cyclictest(8) touch 3-4 millisecond within couple of minutes (it may be higher for a 24hr run).

So, I was wondering if you have run similar tests on the system on your end and if there’s a way to achieve better real-time performance (determinism as well as low-latency).

Dear pablo.ongini99501,
Could you share your binaries to repro this locally

Hello SivaRamaKrishna,

As I have mentioned in my post, cyclictest(8) is a standard open-source application to measure real-time latencies. And stress(1) is again an open-source tool to stress the system with different loads.

Both can be installed on a system having internet connection with:
$ sudo apt install rt-tests stress

However, I am attaching both the binaries to this post (dunno if that’s possible, though; EDIT: I have attached both in the tar).
binaries.tar (80 KB)

Hi pablo.ongini99501,

May I know you tested on which version? ran cyclictest with or without stree? any options?
If you can let me know your detailed reproducing steps, it will be easier for me to check with our QA guys. Thanks!

Hello vickyy,

Sure.

The board details are below:
We have flashed the DRIVE AGX with DRIVE OS v5.1.0.2 and DRIVE Software 9.0 through SDK Manager.
XavierA has been updated RedHawk kernel whereas XavierB is still running PRT kernel flashed from the SDK Manager.

root@tegra-ubuntu:/home/ubuntu# cat /proc/device-tree/model 
e3550_t194b
root@tegra-ubuntu:/home/ubuntu#
root@tegra-ubuntu:/home/ubuntu# uname -a
Linux tegra-ubuntu 4.9.131-rt93-tegra #1 SMP PREEMPT RT Fri May 3 22:51:05 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux
root@tegra-ubuntu:/home/ubuntu#

Is there any other board detail required? If so, please let me know how to obtain it.

root@tegra-ubuntu:/home/ubuntu# stress --version
stress 1.0.4
root@tegra-ubuntu:/home/ubuntu#
root@tegra-ubuntu:/home/ubuntu# cyclictest --version
cyclictest: unrecognized option '--version'
cyclictest V 0.93

I think Ubuntu18.04 has latest cyclictest(8) version 1.0. But that should not matter much.

Stress is being run everytime we have run the test. We do that even when benchmarking RedHawk kernel. Here’s how I am running the tests:

root@tegra-ubuntu:/home/ubuntu# stress --vm 6 &
[1] 31497
stress: info: [31497] dispatching hogs: 0 cpu, 0 io, 6 vm, 0 hdd
root@tegra-ubuntu:/home/ubuntu#
root@tegra-ubuntu:~# cyclictest -m -p 85
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 8.51 4.11 1.62 6/615 2325           

T: 0 ( 2325) P:85 I:1000 C:  32780 Min:     28 Act:   59 Avg:   59 Max:    3081
root@tegra-ubuntu:~#
root@tegra-ubuntu:~# cyclictest -m -p 95
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 8.74 5.46 2.36 7/615 2333           

T: 0 ( 2331) P:95 I:1000 C:  41508 Min:     29 Act:   47 Avg:   49 Max:    3025
root@tegra-ubuntu:~#
root@tegra-ubuntu:~# cyclictest -m -p 95 -S
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 9.20 6.63 3.14 7/612 2481           

T: 0 ( 2473) P:95 I:1000 C:  79407 Min:     11 Act:   19 Avg:   19 Max:     740
T: 1 ( 2474) P:95 I:1500 C:  52933 Min:      7 Act:   21 Avg:   20 Max:     540
T: 2 ( 2475) P:95 I:2000 C:  39697 Min:     13 Act:   17 Avg:   22 Max:     618
T: 3 ( 2476) P:95 I:2500 C:  31754 Min:     14 Act:   24 Avg:   23 Max:     441
T: 4 ( 2477) P:95 I:3000 C:  26460 Min:     13 Act:   27 Avg:   25 Max:     520
T: 5 ( 2478) P:95 I:3500 C:  22677 Min:     14 Act:   23 Avg:   25 Max:     498
root@tegra-ubuntu:~#

But I have run it without stress as well:

root@tegra-ubuntu:~# ps -ef | grep stress
root      2497  2238  0 11:53 pts/6    00:00:00 grep --color=auto stress
root@tegra-ubuntu:~# cyclictest -m -p 95
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 4.45 6.81 4.50 2/598 2501          

T: 0 ( 2499) P:95 I:1000 C:  41632 Min:     27 Act:   35 Avg:   35 Max:    2826
^Croot@tegra-ubuntu:~#

All the above numbers are after running the test for few minutes.

Please let me know if more info is needed. Also, if you benchmarked the board at your end, we would like to know how and what tools were used.

Thank you for the detailed information! We are checking with our QA team and will get back to you.

Hello VickNV,

Any update on this?

Hi pablo.ongini99501,

Thanks for reminding me!

I tried your steps with Drive Software 10.0 and it looks better.
Though we will announce the release in a few days, maybe you can already access it via SDM Manager.
If that’s the case, you can have it a try too.

root@tegra-ubuntu:/home/nvidia# stress --vm 6 &
[1] 4979
root@tegra-ubuntu:/home/nvidia# stress: info: [4979] dispatching hogs: 0 cpu, 0 io, 6 vm, 0 hdd
root@tegra-ubuntu:/home/nvidia# cyclictest -m -p 85
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 7.93 3.86 1.76 7/753 5009           

T: 0 ( 4987) P:85 I:1000 C:  31628 Min:     26 Act:   36 Avg:   44 Max:     317
^Croot@tegra-ubuntu:/home/nvidia# cyclictest -m -p 95
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 9.31 5.21 2.41 7/753 5030           

T: 0 ( 5021) P:95 I:1000 C:  41306 Min:     26 Act:   38 Avg:   43 Max:     241
^Croot@tegra-ubuntu:/home/nvidia# cyclictest -m -p 95 -S
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 10.20 6.57 3.16 3/761 5080           

T: 0 ( 5044) P:95 I:1000 C:  79520 Min:     10 Act:   17 Avg:   18 Max:     161
T: 1 ( 5045) P:95 I:1500 C:  53007 Min:     11 Act:   19 Avg:   19 Max:     128
T: 2 ( 5046) P:95 I:2000 C:  39745 Min:     12 Act:   21 Avg:   20 Max:     174
T: 3 ( 5047) P:95 I:2500 C:  31791 Min:      9 Act:   23 Avg:   19 Max:     213
T: 4 ( 5048) P:95 I:3000 C:  26487 Min:     10 Act:   15 Avg:   18 Max:     148
T: 5 ( 5049) P:95 I:3500 C:  22695 Min:     11 Act:   19 Avg:   20 Max:     141

Hello VickNV,

Thank you for quick response.

I do see DRIVE Software 10.0 in my SDKManager account. It requires Ubuntu18.04 and my system is 16.04. I will update my system and flash the AGX Xavier with DRIVE Software 10.0.

The numbers are looking good though they are for only few minutes and hope that they remain within range for a long period of time.

Once I flash my box with DRIVE Sw 10.0 and run the numbers, I will post here.

Hello VickyNV,

We have a system setup with Ubuntu18.04 and DRIVE Software 10.0. Also, the DRIVE AGX has been flashed with this software. I have run the cyclictest(8) program for 48h and max I have seen is 2.8ms.

nvidia@tegra-ubuntu:~$ uname -a | more                                          
Linux tegra-ubuntu 4.14.102-rt53-tegra #1 SMP PREEMPT RT Fri Sep 20 16:23:45 PDT
 2019 aarch64 aarch64 aarch64 GNU/Linux                                         
nvidia@tegra-ubuntu:~$
nvidia@tegra-ubuntu:~$ sudo cyclictest -a 3 -m -p 85 -D48h
policy: fifo: loadavg: 11.74 12.10 12.01 9/663 23037
T: 0 ( 3429) P:85 I:1000 C:124988028 Min:     17 Act:   67 Avg:   42 Max:    2800

This test was run on Xavier B. However, now it is in a weird state as I can’t use a minicom or ssh session to get into it. The above 'C:'ount value shows that it had run for about 34h before getting into this state.

How soon did the 2.8 ms max latency happen in your 48 hours testing?

We internally tested only 30 min (also with other different settings from yours) so maybe that’s why we didn’t see such max latency as you saw. I’ll check with the team first.

Hello,

I don’t remember exactly when this occurred as I just let it run over the weekend, but I remember seeing a higher number(may be close to 2ms) during first couple of hours, I guess.

BTW, another interesting thing that I noticed was that there’s a very high latency spike (85-105ms) when there’s some activity going on on the serial console. I have not run enough experiments with it to be certain but I saw it couple of times.

This is how I found out:

  1. Open a serial connection to the Xavier. I have used ‘minicom’.
  2. In another terminal, ssh into the Xavier, start the stress(1) load and cyclictest(8) on that window.
  3. On the terminal running the minicom program, run top(1) or any watch(1) command, you may see latency spikes occurring on the terminal running cyclictes(8)

As I said, I have not performed enough experiments with it and it could just be my setup, but it’ll be great if you can confirm or deny this.

Hi,

For the command ($ sudo cyclictest -a 3 -m -p 85 -D48h) you observed 2.8 ms max latency, did you run it from serial console or ssh? Could you run it with -q option to see if help?

Hello VickNV,

We had run this command in an ssh session with no serial console. So, there was no activity running on the serial console.

Well, I would have run the above command with ‘-q’ option but as I noticed during this test, the system rebooted before completing 48hrs. Looking at the count (C:) value, it looked like that last test ran for 34hr, so with -q option I might not have seen the max value. But still I can give it a try during these holidays.

I can also try to run the command without -q option but with -M option i.e. refresh on max. May be that helps.

I will update you after my testing.

Hello VickNV,

I have tried running cyclictest(8) with option -q over a 20h period on PREEMPT_RT kernel flashed by the SDKManager, but the system became unresponsive by the time I arrived next day. That’s not good.

I am gonna run the program again but with -M option instead of -q and get back to you.

Did you get a chance to run it on your end?

Hello VickNV,

I have tried running cyclictest(8) with option -q over a 20h period on PREEMPT_RT kernel flashed by the SDKManager, but the system became unresponsive by the time I arrived next day. That’s not good.

I am gonna run the program again but with -M option instead of -q and get back to you.

Did you get a chance to run it on your end?

Hello VickNV,

I have tried running cyclictest(8) with option -q over a 20h period on PREEMPT_RT kernel flashed by the SDKManager, but the system became unresponsive by the time I arrived next day. That’s not good.

I am gonna run the program again but with -M option instead of -q and get back to you.

Did you get a chance to run it on your end?

Please ignore those extra comments. I was getting errors while posting the answer so I refreshed the page couple of times. But it looks like they were posted. Strange!

Sorry about that.

Hi pablo.ongini99501,

Sorry for delay response!

The current DRIVE Software release is still a development enabler for functionality and not performance. For RT performance, you need to wait for a future software release. Thanks!

If you have any critical need on this, please contact your NVIDIA representative.