Built-in Self-Test for Jetson AGX Xavier

aravind.d · February 27, 2023, 7:42am

Hi,

We are building an avionics system using Jeston AGX Xavier. As part of system design, we need to include several BIST (Built In Self-Test) diagnostics along with application code.

When I researched online, I found that desktop grade Nvidia GPUs provide “NVML” (Nvidia Management Library). But, it is not supported by Jetson AGX Xavier.

I also found that, Jetson supports “tegrastats”

I felt that, “tegrastats” is limited and doesn’t cover hardware tests comprehensively.

I am looking for “NVML” like SDK for Jetson device.

Can someone please help?

Thanks & Regards,
Aravind

linuxdev · February 27, 2023, 6:48pm

I can’t give you a good answer, but here is some information you might find useful…

Desktop PCs use a discrete GPU (dGPU) via the PCI bus (and a dGPU has its own RAM). Jetsons have an integrated GPU (iGPU) tied directly to the memory controller (and shares system RAM). Much of the GPU management and detection software you will find for the dGPU world depends on PCI query. None of that works on the Jetson since it isn’t a PCI GPU.

It might be possible to write a “virtual” PCI-to-iGPU emulator in kernel space, but it’d be difficult if it is to work with “stock” PCI query tools (I think it could be done by NVIDIA, but it wouldn’t be easy even for them).

tegrastats is aware of the iGPU. I’ve seen other people asking for a “more evolved” tegrastats, which probably wouldn’t be too difficult. It might even be on NVIDIA’s radar, not sure.

Meanwhile, some of the statistics and data applications such as tegrastats uses will be from reading files in “/sys” (which are not real files, they exist in RAM and are actually part of drivers just pretending to be files). At other times tegrastats will perform system calls to the kernel (which might be via a PCI call in the case of a dGPU). If you don’t have the “strace” program, then “sudo apt-get install strace”. This allows watching system calls as they occur (it requires sudo/root access).

One can watch the system calls (which are very close to a C syntax) and figure out what the program is calling. If you are sufficiently motivated, then you could learn to make those system calls directly without strace, or else to read the files noted in strace (the “openat” command is shown when a file is opened, along with the file full path).

tegrastats queries about once or twice per second (not sure of exact rate), and you could follow this content like this just to see it:
sudo strace tegrastats
(use CTRL-c when done)

Similarly, a log file could instead be created:
sudo strace -oTraceLog.txt tegrastats
(then examine TraceLog.txt; use CTRL-c after you’ve read stats maybe twice, there is an enormous amount of output)

Note that you will also see ioctl calls, which are basically extensions to “read everything like a file” when the file is really a pseudo file that is in turn part of a driver.

ltrace (“library trace”) is similar, but for calls to linked libraries.

Because of the lack of PCI interface I don’t think you will find much “preexisting” software for management. On the other hand, any system call you find which might be of interest is something you could ask about here on the forums. Any file which is queried is something you can directly perform without any modification or special software (the “openat” function call).

aravind.d · February 28, 2023, 5:34am

@linuxdev: Thank you for detailed answer.

It is unfortunate that, we will not be able to use NVLM ( NVML API Reference Guide :: GPU Deployment and Management Documentation (nvidia.com)) calls in Jetson devices.

I understood your suggestion regarding strace. Let me check those things.

Just wanted to ask one more question,
Do you know any reference/library/application code already implemented for performing self-test for Jetson devices? Since Jetson devices are being used in Aerospace applications from many years, Nvidia might have a reference design for it.

Thanks,
Aravind

linuxdev · February 28, 2023, 7:55pm

Someone from NVIDIA might know, but I do not.

This is a longshot, but if you happen to be using Concurrent’s software for the soft/hard realtime, then they would likely have some very good answers:
https://concurrent-rt.com/partners/technology/nvidia/

I say this because developing something close to hard realtime implies a lot of testing, including latencies which would apply to your case.

aravind.d · March 1, 2023, 5:21am

Thanks @linuxdev

system · March 21, 2023, 6:10am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Checking GPU/DLA utilization/stats (Solved) Jetson AGX Xavier	11	5537	October 18, 2021
how to do GPU performance monitor in Xavier (Solved) Jetson AGX Xavier	11	19255	October 18, 2021
Jetson Xavier GPU/DLA load percentage via SYSFS Jetson AGX Xavier kernel , dla	4	608	December 12, 2022
GPU/NVENC/NVDEC/VIC load units Jetson Orin NX	5	1227	August 28, 2023
Jetson AGX Xavier DDR Test Jetson AGX Xavier performance	16	1730	October 18, 2021
No GPU? (Solved) Jetson AGX Xavier	12	7540	October 18, 2021
How to programmatically query CPU load? Jetson AGX Orin documentation	4	1351	February 2, 2023
Jetson AGX Xavier Deep Learning Inference Benchmarks Jetson AGX Xavier	17	7677	June 15, 2021
The better way to logging and monitoring the system Jetson AGX Orin observability-logging-monitoring	6	230	January 28, 2025
Does NVML fully support Jetson Orin GPU devices Jetson AGX Orin	3	25	June 4, 2025

Built-in Self-Test for Jetson AGX Xavier

Related topics