I have a couple of questions regarding the use of Aerial on Kubernetes:
Is there a specific reason why emptyDir is used for mounting /dev/shm inside the aerial-l1-pod? Also, does mps and ipc service both uses this directory while running Aerial? The screenshot below is from aerial-l1-pod.yaml from aerial-l1 helm chart.
Welcome to our community!
1 a). Is there a specific reason why emptyDir is used for mounting /dev/shm inside the aerial-l1-pod?
In Aerial, the nvipc requires share memory between L1 and L2. we use /dev/shm a shared memory interface. with an assumption that a single K8s pod contains both L1 and L2, we just need a shared directory that L1 and l2 can see and emptyDir is fine. L1 and L2 can be run in separate pods too, then some other mount would be necessary, so that L1 and l2 could see the same filesystem.
1 b) does mps and ipc service both uses this directory while running Aerial? T
No. mps and ipc do not use the same directory. The shared memory IPC (nvipc) use /dev/shm. The MPS typically use CUDA_MPS_PIPE_DIRECTORY=/var and CUDA_MPS_LOG_DIRECTORY=/var in the l1 container. The deployment assumption is that L2 doesn’t use the GPU at all, so it doesn’t need MPS.
If this deployment assumption is changed and L2 will also use the GPU, then the L1 and L2 container would need to see the same filesystem view.
Would it be possible to use the Kubernetes Device Plugin MPS feature when deploying Aerial, or are there any known limitations or conflicts?
For Aerial, we do not recommend sharing the GPU with other instances. The presence of other workloads can affect Aerial’s ability to meet the processing timing budget. Instead of sharing GPU using Kubernetes Device Plugin MPS feature, MIG is recommended for multitenancy on a single GPU. In Aerial user guide, we have intro description about deploying Aerial in MIG mode.
Thanks for your answer! It really helped me set the tone for my development. I have one more question, and your reply would be very helpful for me in making some decisions.
Q1. Is it possible to run Aerial with MIG enabled on Kubernetes?
I’m trying to set up MIG in Kubernetes using the NVIDIA GPU Operator and Device Plugin, and deploy Aerial. I would greatly appreciate it if you could confirm whether this setup is officially supported — in particular, whether it is acceptable to use the GPU Operator and Device Plugin for enabling MIG, even if this approach differs from the method described in the official Aerial MIG deployment guide.
@hyejung.hwang
yes, you can run Aerial with MIG enabled on Kubernetes.
With the assumption that you have done the GPU MIG enable and GPU partition in K8s environment. For example, mig device mig3g.48gb is available, then in Aerial pod yaml file, specify resources/requests/nvidia.com/mig-3g.48gb: 1, as shown below,