Boot Time Problem: nv_virtual_shutdown Service Fails

tpoerio1 · February 22, 2024, 12:14am

Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Summary:

Belief is that the Nvidia Linux systemd service’s dependency on modules being loaded isn’t working.

There is a device that we need to wait for – it exists when we check at runtime, but it seems it’s not there during boot.

The file is /dev/tegra_hv_pm_ctl

We are seeing an error that manifests as:

tegra-ubuntu:~$ systemctl status nv_virtual_shutdown
● nv_virtual_shutdown.service - Hypervisor initiated Shutdown Service
     Loaded: loaded (/lib/systemd/system/nv_virtual_shutdown.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-02-20 18:47:01 UTC; 23min ago
   Main PID: 1001 (code=exited, status=255/EXCEPTION)

Feb 20 18:47:01 tegra-ubuntu systemd[1]: Started Hypervisor initiated Shutdown Service.
Feb 20 18:47:01 tegra-ubuntu bash[1013]: chmod: cannot access '/dev/tegra_hv_pm_ctl': No such file or directory
Feb 20 18:47:01 tegra-ubuntu bash[1019]: hv_pm_ctl_init: Failed to open /dev/tegra_hv_pm_ctl, -2
Feb 20 18:47:01 tegra-ubuntu systemd[1]: nv_virtual_shutdown.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 20 18:47:01 tegra-ubuntu systemd[1]: nv_virtual_shutdown.service: Failed with result 'exit-code'.

However, we can see that file actually does exist:

tegra-ubuntu:~$ ls -l /dev/tegra_hv_pm_ctl
crw-------. 1 root root 476, 0 Feb 20 18:47 /dev/tegra_hv_pm_ctl

Checking journalctl, we can see:

egra-ubuntu:~$ journalctl -b-1 -u nv_virtual_shutdown
-- Logs begin at Tue 2024-02-13 19:28:37 UTC, end at Tue 2024-02-20 19:21:59 UTC. --
Feb 20 18:47:01 tegra-ubuntu systemd[1]: Started Hypervisor initiated Shutdown Service.
Feb 20 18:47:01 tegra-ubuntu bash[1013]: chmod: cannot access '/dev/tegra_hv_pm_ctl': No such file or directory
Feb 20 18:47:01 tegra-ubuntu bash[1019]: hv_pm_ctl_init: Failed to open /dev/tegra_hv_pm_ctl, -2
Feb 20 18:47:01 tegra-ubuntu systemd[1]: nv_virtual_shutdown.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 20 18:47:01 tegra-ubuntu systemd[1]: nv_virtual_shutdown.service: Failed with result 'exit-code'.

Request

Is there a known workaround for this problem, or a known fix?

Should there be a different dependency requirement in the .service file?

Or maybe the service should have an automatic restart on failure as well?

VickNV · February 22, 2024, 12:54am

tpoerio1:

tegra-ubuntu:~$ systemctl status nv_virtual_shutdown
● nv_virtual_shutdown.service - Hypervisor initiated Shutdown Service
     Loaded: loaded (/lib/systemd/system/nv_virtual_shutdown.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-02-20 18:47:01 UTC; 23min ago
   Main PID: 1001 (code=exited, status=255/EXCEPTION)

Feb 20 18:47:01 tegra-ubuntu systemd[1]: Started Hypervisor initiated Shutdown Service.
Feb 20 18:47:01 tegra-ubuntu bash[1013]: chmod: cannot access '/dev/tegra_hv_pm_ctl': No such file or directory
Feb 20 18:47:01 tegra-ubuntu bash[1019]: hv_pm_ctl_init: Failed to open /dev/tegra_hv_pm_ctl, -2
Feb 20 18:47:01 tegra-ubuntu systemd[1]: nv_virtual_shutdown.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 20 18:47:01 tegra-ubuntu systemd[1]: nv_virtual_shutdown.service: Failed with result 'exit-code'.

I didn’t encounter the issue on my system. According to my system status, the service has been active and running without any problems for the past five days.

Active: active (running) since Fri 2024-02-16 19:01:48 UTC; 5 days ago

Have you tried power cycling or reflashing your system to see if that resolves the issue?

tpoerio1 · February 22, 2024, 3:39am

Thanks for the quick response.

Yes, power-cycling the system has solved the issue in general.
There’s not enough sample size to say it fixes 100% reliably yet, though.

That said, power-cycling is not a valid fix for our use-case, in practice.

Instead, we need to resolve the problem & stop it from occurring entirely, rather than requiring random power-cycles – mainly because it causes confusion for the users of our system & is disruptive.

Any ideas about a deeper fix here?

tpoerio1 · February 22, 2024, 4:34am

A further important point is that this happens occasionally, and non-deterministically.

It doesn’t happen on every boot cycle, but seemingly randomly.

It seems to be a timing issue

VickNV · February 22, 2024, 5:21pm

Could you please provide detailed steps or specific conditions that reliably reproduce this issue? Understanding the exact circumstances in which it occurs will help us investigate and identify a more permanent solution. Thank you.

cstaylor · February 26, 2024, 10:16pm

So the problem is a race condition between the appearance of the /dev/tegra_hv_pm_ctl device file for communicating with the hypervisor from Linux and when nv_virtual_shutdown.service is started.

The solution is to add: ConditionPathExists=/dev/tegra_hv_pm_ctl in the nv_virtual_shutdown.service file and create a new file called nv_virtual_shutdown.path with the following content:

[Path]
PathChanged=/dev/tegra_hv_pm_ctl

Unit=nv_virtual_shutdown.service

[Install]
WantedBy=multi-user.target

Then enable this path file using sudo systemctl enable nv_virtual_shutdown.path

For more information:

VickNV · February 27, 2024, 5:28pm

Thank you for providing the solution. Could you confirm if you encountered this issue on a system running the default DRIVE OS 6.0.8.1 configuration?

cstaylor · February 28, 2024, 9:30pm

Yes I have personally experienced this problem on a DriveAGX system running 6.0.8.1.

cstaylor · February 28, 2024, 9:32pm

Do you mean without the change provided by nvidia previously via forum that was masking the service if it couldn’t find this same file? If that patch isn’t installed, reboot won’t work at all because the service won’t be running.

VickNV · February 28, 2024, 9:35pm

Could you provide detailed steps to reproduce the issue by rebooting? using “sudo reboot”?

cstaylor · February 28, 2024, 9:41pm

So it’s a race condition between the driver loading and that device file appearing, and the nv_virtual_shutdown service starting with the expectation that file exists. If you reboot your DriveAGX units enough times with the previously provided by nvidia patch for no longer masking the service if that device file doesn’t exist, you’ll see it.

We reboot the units with the command:

echo 1 | sudo tee /sys/class/tegra_hv_pm_ctl/tegra_hv_pm_ctl/device/trigger_sys_reboot > /dev/null

VickNV · March 6, 2024, 8:13pm

The required solution will be incorporated in the next release. Thanks.

system · April 4, 2024, 8:54am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
System hang occasionally by nvidia driver DRIVE AGX Orin General driveos	9	992	May 4, 2023
Drive Orin System hang DRIVE AGX Orin General drive-platform-bootup	43	2183	April 3, 2023
Nvidia DRIVE AGX Orin setup error state help DRIVE AGX Orin General drive-platform-setup	17	278	October 15, 2024
Reboot workaround does not reliably work DRIVE AGX Orin General drive-platform-setup	13	944	February 13, 2024
Kernel Panic when hwrng_fillfn called from ret_from_fork DRIVE AGX Orin General driveos	2	308	May 6, 2024
[BUG] sudo systemctl restart nv_nvsciipc_init.service failed without modifying any content of nvsciipc.cfg DRIVE AGX Orin General driveworks-cgf	5	286	June 14, 2024
Continuous reboots will fail to boot into OS Jetson AGX Orin reboot	14	374	June 5, 2024
Driver orin flash failed DRIVE AGX Orin General drive-platform-setup	17	1336	April 20, 2023
Official Guidance on Safe Shutdown for Drive AGX Orin Dev Kit? DRIVE AGX Orin General driveos	4	42	June 15, 2025
Failed to flash Drive OS 6.0.10: Unable to read BCT DRIVE AGX Orin General drive-platform-setup	19	121	June 1, 2025

Boot Time Problem: nv_virtual_shutdown Service Fails

Summary:

Request

Related topics