Question about timing of TX2

I am implementing radar processing line (algorithms blocks taken from literature) using the jetson TX2 and using cuda to try and get the fastest processing time possible.
I am using remote system - host with linux operating system and eclipse software environment connected with Ethernet cable to the device.
This is the first time i am doing such project, and when run the code (on release mode for better timing) i get different run time result (can be very different e.g. one time i get 70msec, another time i get 110msec).
My question is what are the possible reasons for such thing to happen? why is it so unstable?

To get the best possible results start with setting up the maximum performance range of settings:
sudo nvpmodel -m 0

Then set the clocks to max to prevent any energy savings modes (under load the mode would go back up, but would add latency before ramping back up):
sudo jetson_clocks

Note that “jetson_clocks” is just human readable script and you can see what it does there. One thing it does which you may want to change after testing is forcing the fan on 100% at all times (when testing max performance most people want to be sure heat generation doesn’t throttle back, but an automatic setting may be appropriate for actual use).

The nvpmodel has an ability to store a current mode and reset to that memorized mode. See “nvpmodel --help”.

The ARM Cortex-A cores themselves are not designed for hard real time. Depending on cache hits and misses this can result in variations. Sometimes assigning a core affinity can help.

Linux has real time extensions available, and you can set your process to higher priority through cgroups. Audio does this, and this is not hard realtime, but useful.

Hard realtime systems normally use the Cortex-R or Cortex-M. These tend to not be as high of performance, but do allow deterministic execution.

There are things you can do in this environment to improve timing, but these are not “fixes” so much as tweaks for different circumstances.

Also note that when you are executing through networking that networking itself has MTU/MRU (MRU and MTU work together as a pair between two network devices). Depending on protocol and other issues of networking (such as triggering of a full buffer for immediate send versus waiting until a timeout to send a partial payload) the network itself can add some differences. If the timing is at the Jetson end at the moment of a process starting you can probably ignore networking, but if latency and variation is measured over ethernet, then you can’t rule out ethernet being part of the issue.

In the end you will probably need to test with max performance first, and with that set, provide use case and issues of that use case which you are still running into.

1 Like