Slower Speeds with Different NVIDIA Image

Hi, so I am running into an issue where switching my base Docker image results in slower speeds (specifically for inference using deep learning models).

Previously I was using a base image of nvidia/cuda:12.2.0-runtime-ubuntu20.04 with the following Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

EXPOSE 8080

# The directory is created by root. This sets permissions so that any user can
# access the folder.
RUN mkdir -m 777 -p /usr/app /home
WORKDIR /usr/app
ENV HOME=/home

# Install python 3.9
# (installing 3.10 like this added 2GB to the image size)
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends  \
    python3.9 python3.9-distutils python3.9-dev curl build-essential

RUN curl -sSL https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

However, I recently switched to using a base image of nvcr.io/nvidia/pytorch:23.08-py3 with a Dockerfile that looks like:

FROM nvcr.io/nvidia/pytorch:23.08-py3

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

EXPOSE 8080

# The directory is created by root. This sets permissions so that any user can
# access the folder.
RUN mkdir -m 777 -p /usr/app /home
WORKDIR /usr/app
ENV HOME=/home

COPY requirements.txt requirements.txt

# Removing torch and nvidia packages from the requirements so they don't conflict
# with what's in the base image.
RUN sed -i '/torch==/d' requirements.txt
RUN pip install -r requirements.txt --no-cache-dir

ENV PYTHONPATH "${PYTHONPATH}:/usr/app/"

LD_LIBRARY_PATH=/usr/local/cuda/compat/lib.real:/usr/local/lib/python3.10/dist-packages/torch/lib:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
ENV PATH=/usr/local/nvm/versions/node/v16.20.0/bin:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin

I added the environment variables LD_LIBRARY_PATH and PATH based off of this StackOverflow post because I was getting the same warning described.

Setting those variables allows me to utilize CUDA, however, speeds are faster than without CUDA, but are still ~3x slower than they were with the previous image.

This is all running on a NVIDIA_TESLA_T4 GPU.

Does anyone have an idea as to what could be causing these slow downs? If you need any additional information, I’m happy to provide it.