Does Jetson Orin + k8s-device-plugin + MPS work?

Hi, I’m running Kubernetes with the latest nvidia/k8s-device-plugin:v0.17.1 on a Jetson AGX Orin node (JetPack 6.2), and I encountered a persistent crash issue when enabling the MPS Control Daemon.

I know that JetPack 6.1 has added official support for MPS on Jetson, and I can confirm that MPS works correctly on my Jetson AGX Orin outside of containers — for example, I’m able to manually start the MPS control daemon using nvidia-cuda-mps-control -d on the host without any issues.

However, when I deploy the nvidia/k8s-device-plugin:v0.17.1 with MPS enabled and set replicas: 4 for MPS control, I noticed something unexpected: when I check the available GPU resources via:

kubectl describe node NODE | grep ``nvidia.com/gpu

Kubernetes reports 8 GPUs, which is inconsistent with my actual hardware and the expected number of MPS partitions. I was expecting to see 4 logical GPUs corresponding to the replicas: 4 setting.

In addition, when I run:

kubectl get pods -n nvidia-device-plugin -o wide

I see that some of the nvidia-device-plugin-mps-control-daemon pods are in a CrashLoopBackOff state.

The detailed error message from the logs of container mps-control-daemon-ctr is:

E0904 09:35:53.754750 312 main.go:84] error starting plugins: error getting daemons: error building device map: error building device map from config.resources: error building GPU device map: error visiting device: error building Device: error getting device paths: error getting GPU device minor number: Not Supported

So I’d like to ask:
Is it currently feasible to share GPU resources via MPS on Jetson Orin devices in Kubernetes?
This method works well on traditional x86 servers with discrete GPUs (e.g., RTX or A100 series), but I’m unsure whether the same applies to Jetson-class embedded GPUs. If not supported, is there any recommended workaround or roadmap for enabling this?

Hi,

There is a known issue that you will need to configure k8s with Docker on Jetson.
Containerd is not working right now.

Do you think this might be the issue?

Thanks.

Hi,

I am currently using k8s with Docker rather than with containerd.

And I made a mistake that 16 is not for my orin nodes mentioned. In fact there is no GPU resources be found.

I checked the issue you mentioned and I think my issue is not as same as that one.

I will later try disabling MPS and check if GPU resources are available.
But MPS is essential in my scenario, so could you please help me about the error message?

Thanks.