How to integrate two AGX Orins GPU resources in a Kubernetes (K8s) cluster for running a LLM inference?

Hi ALL,
Is it possible to use Kubernetes to leverage two AGX Orin(connected via ethernet) GPU resource and running the LLM inference?
Are there any references on this topic we can study ?

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Hi,

Kubernetes should work on Orin.
But you might need a workaround for the NVIDIA Container Toolkit.

Please find the below topic for more info:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.