Launching GPU-Enabled Applications with Podman

Overview

We are trying to create and start a pod that includes a container accessing the GPU using podman. However, an error occurs in Chapter 4. Please tell me the countermeasures.

Host machine settings

  • Install CUDAToolkit12.2
  • Install NVIDIA Container Toolkit
  • Generate the CDI Specification file for Podman

Creating the container and starting by podman

  1. Create 6 types of containers using dockerfile. These containers include applications that use the GPU. These containers have a track record of running in a k8s environment.
  2. Create a .yaml file to start with podman
    Here is a sample of the yaml. All 6 types have the almost same yaml file.
apiVersion: v1
kind: Pod
metadata:
 name: test-pod
spec:
  restartPolicy: OnFailure
  containers:
    - name: test
      image: localhost/test:latest
      securityContext:
        privileged: true
      volumeMounts:
      - name: key-shm
        mountPath: /dev/shm/
      device:
        - nvidia-gpu

 volumes:
   - name: key-shm
     hostPath:
       path: /dev/shm/
       type: Directory
  1. Start with the podman command

I started it using the following command: podman play kube test.yaml

  1. Error occurrence

The error content varies depending on the created container.

Pod:
3e9afb45b9047c9a0f6b0d511b59e8d480794fcf4909804c93e9df72e8d4fd06
Container:
106d1afed2092c329aa8329a6ceeb6ccb2e48a1fb58052e2cf21b39bfe9d3a4d

error starting container 106d1afed2092c329aa8329a6ceeb6ccb2e48a1fb58052e2cf21b39bfe9d3a4d: container_linux.g
o:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: err
or running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: invalid expression: OCI runtime error
./test: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory