Jetson TX2 (4GB) crashes after SError (continued.)

cbaumann · June 22, 2023, 12:57pm

Continuing the discussion from Jetson TX2 (4GB) crashes after SError and Machine check error as this thread was closed automatically.

We have some new information about this issue now: as you suggested we disabled the denver cores completely on some of our devices.

This indeed made the problem disappear.

However as I already said earlier: this is not a permanent solution.

So can you indicate next steps to further pinpoint the problem?
Are you aware of some issues that would explain this behavior?

WayneWWW · June 26, 2023, 4:12am

Actually, that is the permanent solution as we disable that 2 cores by default on purpose.

cbaumann · June 26, 2023, 5:05am

Are you serious? So you are selling a 6 core device of which only 4 cores are usable? Is this documented anywhere?

WayneWWW · June 26, 2023, 5:17am

There are some explanations in the release note document.

5.15 Increased Kernel Launch Latency on Denver 2 Cores

And indeed the default software does not enable these 2 cores by default. You could refer to some posts for this issue before.

cbaumann · June 26, 2023, 8:46am

Thank you very much for these links. We don’t have issues with using the denver cores though. We have issues with occasional SError related to using the denver cores.

We are using the following commands to setup clocking and power management:

nvpmodel -m 5
jetson_clocks
echo 0 > /sys/kernel/debug/tegra_cpufreq/B_CLUSTER/cc3/enable

The last command originates from CPU Throttling more on 4.9 than 4.4 - #15 by cquast.

The nvpmodel -m 5 command uses this custom configuration in /etc/nvpmodel.conf:

< POWER_MODEL ID=5 NAME=MAX_FREQ_ALL >
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_A57 MIN_FREQ 2035200
CPU_A57 MAX_FREQ 2035200
CPU_DENVER MIN_FREQ 2035200
CPU_DENVER MAX_FREQ 2035200
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
GPU MIN_FREQ 0
GPU MAX_FREQ 1300500000
GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto
EMC MAX_FREQ 1866000000

Compared to the document you shared we are not using taskset to move a task to the denver cores but we are using /sys/fs/cgroup/cpuset/ as we want to move single threads only. We first create a new cpu-set for each denver core like this:

PATH_PREFIX=/sys/fs/cgroup/cpuset
mk_cpuset() {
    name=$1
    cpus=$2

    # This will create serveral pseudo-files for us to communicate through.
    mkdir ${PATH_PREFIX}/${name}

    # Let's first set the desired cpus.
    /bin/echo $cpus > ${PATH_PREFIX}/${name}/cpuset.cpus

    # Set the desired flags.
    /bin/echo 0 > ${PATH_PREFIX}/${name}/cpuset.mems

    # Restrict the scheduler from load balancing to nearby cpus
    /bin/echo 0 > ${PATH_PREFIX}/${name}/cpuset.sched_load_balance

    # The rest of the pseudo-files are populated with the desired options
    # Perhaps there exists other flags we can set to increase
    # performace.
}

mk_cpuset "verity-rt1" "1"
mk_cpuset "verity-rt2" "2"

Then we use those cpusets by moving threads to them like this. E.g.:

echo $tid > /sys/fs/cgroup/cpuset/verity-rt1/tasks

Can you spot anything in this flow that would explain these SErrors?

WayneWWW · June 26, 2023, 8:51am

Hi,

Basically, we cannot directly share you why SEerrors happened by look through your commands.

If you want us to help check such issue, try to provide a method that can 100% reproduce your issue on latest BSP + NV devkit.

For TX2, the latest BSP is the rel-32.7.4 that was released recently.

And please be aware that we cannot guarantee when will the fix get ready. If they are is some files which is not open source, then you may need to wait until next release.

system · July 19, 2023, 6:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson TX2 (4GB) crashes after SError and Machine check error Jetson TX2 kernel	10	635	May 12, 2023
2 cores at 0% Jetson TX2 kernel	9	2466	October 18, 2021
Two cores disabled. Jetson TX2	21	31372	October 18, 2021
How to enable all cpu's by default? Jetson TX2	12	7116	October 18, 2021
TX2 Cannot Enable Cores 1 & 2 Jetson TX2 board-design , yocto	6	57	December 18, 2024
With nvmodel -m2 make -j6 use only 4 cores Jetson TX2 kernel	27	2360	October 18, 2021
htop and TX2 Jetson TX2	11	2548	June 6, 2017
Can't turn off both Denver cores when using real-time kernel Jetson AGX Xavier kernel , nvbugs	15	1021	October 18, 2021
2 CPU’s usage 0% - Jetpack 4.4, L4T R32.4.2 Jetson TX2 kernel	2	1232	October 18, 2021
Device reboots after using the RT-Kernel patch [JP4.6.3] Jetson TX2 boot , nvbugs , preempt_rt	17	1711	June 14, 2023

Jetson TX2 (4GB) crashes after SError (continued.)

Related topics