I am running a containerized application on a RHEL 9 server (Cisco C220 M5 rackmount server) managed by the Podman service. My local GPU card (NVIDIA Tesla T4 - built for the Cisco C200 series rackmount server) is recognized by the server ILOM (Cisco CIMC) and is displayed as running healthy via the hardware status.
Within the OS, I have installed the NVIDIA GPU card drivers, and have successfully tested with the command “lspci | grep -i nvidia” which returns information about the installed local GPU card.
I have also installed the NVIDIA CUDA toolkit, and have verified it’s successful installation and interaction with the local GPU card with the commands “nvidia-smi -L” (which successfully displays the GPU card information) and “nvidia-smi” (which successfully displays the GPU card information and current runtime information. The “nvcc --version” command also displays the CUDA toolkit version information. I have further installed the NVIDIA cuDNN libraries and verified the successful information with the “mnistCUDNN” command (which returns a test response of “Test Passed!!”). Lastly, I have successfully installed the NVIDIA Container Toolkit and Container Device Interface (CDI) and generated the required specification, and have validated both installations with the command “nvidia-ctk cdi list” which displays information about the locally installed GPU cards.
Using the appropriate Podman Create command, I referenced the GPU card in my run statement (where I included the “–device nvidia.com/gpu=all” switch in the CLI command), created and started my application container. I then ran the command “podman exec -it nvidia-smi” (which would run the “nvidia-smi” command in a temporary shell on the container) and got the expected output displaying the GPU card information and runtime specs.
Given all this, it would seem like my container application is able to interact with the physical GPU card on my server… but the application runs very slowly and jaggedly, which the application support documentation suggests is a result of it not using the GPU at all. Further, if I run the containerized application and simultaneously have an open SSH session to the hosting server and run a “watch nvidia-smi” command running, I can see a very small and quick blip in activity on the GPU (from 0% to 3% utilization, lasting half a second) which the application documentation also suggests indicates that the application is not using the GPU.
What else can I do at the application or podman container level to ensure that my containerized application can fully make use of the locally installed GPU card?