I was directed here by support. My Azure A100x8 instance (try, ID: wdfmpritd) has a crashed network agent. Ports show as ‘Unhealthy’, SSH times out, and the CLI throws a not_found error.
Also Port 8888 is also returning a 502 Bad Gateway. Cloudflare shows ‘Host Error’. The machine is completely unreachable.
Because this is an Azure cluster, the dashboard states ‘This environment does not support stopping and starting’. I am completely locked out and have critical data on this volume.
Can an engineer please either:
-
Force a backend hard-reboot of the node to restore the network.
-
Detach the volume and mount it to a recovery instance.
A standard user cannot fix this from the UI. Please advise.