Enabling RAS drivers on NX dev kit (Reliability, Accessibility and Serviceability)

Hi Nvidia community, I’m interested in using the Xavier NX’s built-in Reliability, Accessibility and Serviceability (RAS) drivers to understand ECC performance of the Carmel CPUs in an extreme environment.

2 goals :
(1) Report any RAS errors (both correctable & uncorrectable) to the serial UART2 debug interface. Also report the origin of the error i.e. CPU unit

(2) “Spoof” or trigger a RAS error to mimic the conditions the Jetson will live in, and validate that software recovery from the error happens correctly.

I’ve seen the following 3 threads:
First
Second
Third

However I’m unable to find any of the drivers or functions mentioned such as :

drivers/ras/arm64_ras.c
drivers/platform/tegra/carmel_ras.c
/sys/kernel/debug/carmel_ras/RAS_MCA_ERR-trip

In fact, those directory locations don’t even exist on my machine.

After lots of grep, the only reference I see to anything ras-related is .h header files:

/usr/src/linux-headers-5.10.104-tegra-ubuntu20.04_aarch64/nvidia/include/linux/arm64_ras.h

and

/usr/src/linux-headers-5.10.104-tegra-ubuntu20.04_aarch64/nvidia/include/linux/platform/tegra/carmel_ras.h

Also dmesg log says “CPU features: detected: RAS Extension Support”. I guess that means something?

How can I get the .c drivers installed + fully functional to do the 2 actions I want above?

I’m using the Jetson Xavier NX dev kit running Jetpack 5.0.2, 5.10.104-tegra.

RAS driver has been moved from Kernel to ATF (ARM Trusted Firmware) in the new releases with Kernel-5.10.

Thank you @sumitg ! I see. Will use these docs for ATF then.

And please correct me if i’m wrong, but from the older posts on this topic + a paper I read, it seemed that the carmel_ras.c and arm64_ras.c were out-of-the-box implementations provided in Jetpack 4.x to capture RAS errors in UART debug logs (same as any other errors thrown). Injecting errors via the RAS_MCA_ERR-trip node could also be done out-of-the-box.

I’m not seeing any analogous code examples though in the ATF docs. I’m more interested in seeing the errors as debug logs post-runtime rather than polling RAS registers real-time as errors happen.

  • Would I write my own C driver to capture & inject errors to replicate the functionality that was provided in Jetpack 4.x?
  • If so, is there a way to port the old drivers over to Jetpack 5.x to avoid re-doing this from scratch?

Please check about the ‘tf-a-tests’ project and how to deploy it on their device.
The RAS tests are already part of tftf/tests/tests-tegra194.mk - TF-A/tf-a-tests - Gitiles
(trustedfirmware.org),
So, If you can run the framework then the test will be executed automatically on boot.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.