Failed Watchdog timeout in thread server_main in CloudXRRemoteHMD after 10.440015 seconds. Aborting

I’m currently seeing this error at the end of the vrserver log when I start SteamVR. This is causing SteamVR to croak, and block the CloudXR add-on on the next start. Is this something that I can configure around? Can I turn off the watchdog for the CloudXR threads?

Hi @marc24 - Can you please send over your server logs?

Thanks!

Veronica

Hi Veronica,

Attached are the CloudXR server logs, and the Steam logs. The machine has just booted, and we’re starting SteamVR automatically on user login with a Windows scheduled task. It comes up and shows the “Unexpected SteamVR Error” dialog that says “SteamVR has encountered a critical error.” and has buttons to restart or quit SteamVR. These log snapshots are taken at this point. If I then restart SteamVR, it will come up in safe mode with the Gamepad and CloudXR add-ons blocked. Please let me know if there is more info I can gather for you. Thanks.

/marc

cloudxrlogs.zip (1.1 KB)
steamlogs.zip.zip (850.9 KB)

Hi @Veronica_NVIDIA,

We have also tested this with CloudXR 3.0 and are seeing the same issue. The “Known issues” part of the docs talks about SteamVR coming up in safe mode and recommends using SteamVR Beta. Is that what you would suggest for this issue? Thanks.

/marc

Also adding a Steam report for one of our instances, in case that provides information that is useful…

/marc

SteamVR-2021-09-03-AM_09_15_09.txt (463.0 KB)

We are seeing the exact same issue on our Windows VMs running CloudXR Server. SteamVR runs for about ~10 seconds before reporting the Critical Error (and offering to Quit or Restart SteamVR).

Failed Watchdog timeout in thread server_main in CloudXRRemoteHMD after 11.886055 seconds. Aborting.

The error definitely originates from the CloudXRRemoteHMD but the CloudXR Server / Streamer Server logs do not provide any indication of the error. Is there any way to obtain more information / logfiles about what happens in the CloudXRRemoteHMD?

It is worth noting that this kind of behaviour seems to only happen right after booting a machine so it is probably some kind of race condition at system startup and maybe specific to virtual machines (with virtualized hardware). Once manually restarting SteamVR afterwards, everything usually works as expected but a manual restart is not really an option for us so we would really love to automate this procedure and wait for the system to be in a state where the watchdog timeout does not occur. The problem is we don’t know which target state this is.

Did you find out any solution / workaround @marc24?

Hi @vrdev,

Thanks for prodding me to reply. We did figure out a workaround last week. It turns out this issue is related specifically to AWS machines with EBS. When you start an EBS-backed machine in AWS, the EBS volume needs to be hydrated from the snapshot that it is based on. The snapshot data is backed by S3, and therefore is very slow for the first read of data from the volume. We are now using a little PowerShell script to basically cat {files} > /dev/null for all the files in the CloudXR and SteamVR directories before we start SteamVR. This hydrates those files into the volume so that they are fast, and we haven’t had issues crashing since then. More information on this is here: Initialize Amazon EBS volumes - Amazon Elastic Compute Cloud . Hope this helps.

/marc