Slower Speeds with Different NVIDIA Image

evanmcneal1 · January 3, 2024, 7:48pm

Hi, so I am running into an issue where switching my base Docker image results in slower speeds (specifically for inference using deep learning models).

Previously I was using a base image of nvidia/cuda:12.2.0-runtime-ubuntu20.04 with the following Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

EXPOSE 8080

# The directory is created by root. This sets permissions so that any user can
# access the folder.
RUN mkdir -m 777 -p /usr/app /home
WORKDIR /usr/app
ENV HOME=/home

# Install python 3.9
# (installing 3.10 like this added 2GB to the image size)
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends  \
    python3.9 python3.9-distutils python3.9-dev curl build-essential

RUN curl -sSL https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

However, I recently switched to using a base image of nvcr.io/nvidia/pytorch:23.08-py3 with a Dockerfile that looks like:

FROM nvcr.io/nvidia/pytorch:23.08-py3

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

EXPOSE 8080

# The directory is created by root. This sets permissions so that any user can
# access the folder.
RUN mkdir -m 777 -p /usr/app /home
WORKDIR /usr/app
ENV HOME=/home

COPY requirements.txt requirements.txt

# Removing torch and nvidia packages from the requirements so they don't conflict
# with what's in the base image.
RUN sed -i '/torch==/d' requirements.txt
RUN pip install -r requirements.txt --no-cache-dir

ENV PYTHONPATH "${PYTHONPATH}:/usr/app/"

LD_LIBRARY_PATH=/usr/local/cuda/compat/lib.real:/usr/local/lib/python3.10/dist-packages/torch/lib:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
ENV PATH=/usr/local/nvm/versions/node/v16.20.0/bin:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin

I added the environment variables LD_LIBRARY_PATH and PATH based off of this StackOverflow post because I was getting the same warning described.

Setting those variables allows me to utilize CUDA, however, speeds are faster than without CUDA, but are still ~3x slower than they were with the previous image.

This is all running on a NVIDIA_TESLA_T4 GPU.

Does anyone have an idea as to what could be causing these slow downs? If you need any additional information, I’m happy to provide it.

Topic		Replies	Views
Building ffmpeg with nvenc inside docker container Docker and NVIDIA Docker cuda , tensorflow , docker , python , ffmpeg , nvenc	0	274	July 29, 2024
Unable to start CUDA container with recent update on November 10 Container: CUDA cuda , ubuntu , docker	5	3484	November 21, 2023
NVIDIA Docker: GPU Server Application Deployment Made Easy CUDA Setup and Installation	1	3437	May 15, 2024
Latest build of nvidia/cuda:11.8.0-devel-ubuntu22.04 broke my images Container: CUDA	0	2532	November 17, 2023
CUDA is not initialized with Deepstream 6.1 and Official Docker Image DeepStream SDK cuda , docker , deepstream , deepstream61	4	1078	July 27, 2022
Nvidia Driver 390.87 + CUDA, Ubuntu 16.04 docker container, Python3. Host RHEL7.5 Container: CUDA	2	4271	October 12, 2021
Running Cuda on Docker CUDA Setup and Installation	7	17163	May 23, 2016
Cuda 11.4.2 docker image driver version mismatch CUDA Setup and Installation	2	4170	January 7, 2022
CUDA Initialization in cuda docker container Docker and NVIDIA Docker	0	2034	July 28, 2022
K20c with CoreOS Linux	2	1589	July 5, 2017

Slower Speeds with Different NVIDIA Image

Related topics