"nvidia-smi" not working. Nvidia-Xavier Soc(ZF-ProAI)

Hello everyone,

I am working on a device called ZF-ProAI which is composed of a safety microcontroller (Aurix) and a performance microprocessor (Nvidia Xavier SoC). The Nvidia Xavier SoC includes an ARM-based octacore CPU and a Volta GPU.

Technical specification of ZF-ProAI includes Nvidia-Xavier-SOC, CPU 8 Cores @ 2.1 GHz, GPU Volta, 4TPC with Linux tegra-ubuntu 4.14.78-rt44-tegra OS installed in it. This hardware is sold with this preinstalled OS and with CUDA-10 .1 for AI development.

nvidia@tegra-ubuntu:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I am unable to find the exact driver installed. Can anyone help me with this issue.
Thank you in advance.

nvidia-smi” is only used with PCI-based drivers. These are discrete GPUs. The Jetsons all have integrated iGPUs, and are wired directly to the memory controller. Thus, any application using PCI discovery will fail and is not compatible.

I have no idea what ZF-ProAI device is, you need to contact with ZF-ProAI team to get the support directly.

Thank you for the reply,

I am new into this world of GPUs. My PCIe card shows that some GPUs are connected, when I use the lspci command. Does this mean that nvidia-GPU is connected to the PCIe slot?

nvidia@tegra-ubuntu:~$ lspci
0000:00:00.0 PCI bridge: NVIDIA Corporation Tegra PCIe x8 Endpoint (rev a1)
0001:00:00.0 PCI bridge: NVIDIA Corporation Tegra PCIe x1 Root Complex (rev a1)
0004:00:00.0 PCI bridge: NVIDIA Corporation Tegra PCIe x4/x8 Endpoint/Root Complex (rev a1)

Or are these iGPUs ?

Anything you see with “lspci” is in a PCI slot (PCIe is just a serial PCI…different physical connection, same logical connection type). You won’t find a Jetson showing a GPU with “lspci” since it is not on a PCI connector (the wiring is direct to the memory controller bus, and the memory controller knows what physical address to use). Restated, “lspci” is all discrete (dGPU), and commands like “nvidia-smi” require a dGPU.

Those could be anything on your example, but more verbosity (an option to lspci) can shed more light on what is actually there. If you look at your “plain vanilla” lspci you will see that it starts with a bus and slot/function description. You can pick just a single device using that with the “-s” option. As an example you could tell lspci to list only the “0000:00:00.0” device:
lspci -s 0000:00:00.0

Not particularly useful unless you have a lot of PCI devices, but when you combine that with maximum verbosity (which requires using sudo), then it is quite useful:
sudo lspci -s 0000:00:00.0 -vvv
(you might want to use that with a pager it is a big response: “sudo lspci -s 0000:00:00.0 -vvv | less”)

An example of the last lines of a fully verbose dGPU on a desktop PC would be:

        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

(which happens to be an NVIDIA VGA device…the “modules” are not actually a list of things used, but is instead a list of drivers known to work with the hardware…the current driver in this case is the “nvidia” GPU driver)

Of those devices you listed anything which is a root complex is basically a controller of PCIe intended to see and manage PCIe devices. An endpoint is a device being controlled by a root complex (there might be a bridge in-between, but the final authority for the device talking to the host is the root complex).

To know more about what each of those listed lspci devices are you’d want to examine the fully verbose answer, which in turn would be mostly answered by knowing which driver is used. I doubt any of those are GPUs, but it is possible that any “endpoint” could be a dGPU.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.