I have tested the issue with the RTSP in-out Python sample. The backend is TensorRT. Running the sample with the default inference configuration files does not seem to leak memory, but replacing the nvinferserver configuration with settings that use a standalone Triton server instance and GRPC seems to leak memory consistently, proportionally to the input sources count/resolution/bitrate. The Python process is the one that leaks, so it doesn’t seem like a issue in Triton’s server side. The leak being proportional to the input size might indicate the nvinferserver grpc client isn’t correctly freeing frames, or is buffering indefinitely (does it even buffer/queue?). I have only tested the issue by running the standalone Triton server on the same machine as the DeepStream app, and I used YOLOv4 for the grpc test. Updating Triton server to r22.04 had no effect on the leak.
The Dockerfile and nvinferserver configuration I’ve used are as follows:
FROM nvcr.io/nvidia/deepstream:6.0.1-triton
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Tokyo
WORKDIR /workspace
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN apt-get update && apt-get install -y --no-install-recommends \
zip \
python3-gi python3-dev python3-gst-1.0 python-gi-dev git python-dev \
python3 python3-pip python3.8-dev cmake g++ build-essential libglib2.0-dev \
libglib2.0-dev-bin python-gi-dev libtool m4 autoconf automake \
libgirepository1.0-dev \
tzdata && \
rm -rf /var/lib/apt/lists/*
RUN git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps
WORKDIR /workspace/deepstream_python_apps
RUN git checkout 9bffad1aea802f6be4419712c0a50f05d6a2d490
RUN git submodule update --init
WORKDIR /workspace/deepstream_python_apps/3rdparty/gst-python/
RUN git config --global http.sslverify false
RUN sh autogen.sh
RUN make
RUN make install
WORKDIR /workspace/deepstream_python_apps/bindings
RUN mkdir build
WORKDIR /workspace/deepstream_python_apps/bindings/build
RUN cmake .. -DPYTHON_MAJOR_VERSION=3 -DPYTHON_MINOR_VERSION=8
RUN make
RUN pip3 install pyds-1.1.1-py3-none-linux_x86_64.whl
WORKDIR /workspace
RUN git clone https://github.com/NVIDIA-AI-IOT/yolov4_deepstream.git
RUN cp -r yolov4_deepstream/deepstream_yolov4 /opt/nvidia/deepstream/deepstream/sources/deepstream_yolov4
ENV CUDA_VER=11.4
WORKDIR /opt/nvidia/deepstream/deepstream/sources/deepstream_yolov4/nvdsinfer_custom_impl_Yolo
RUN make
While my final application is different, I have observed the leak using the linked application (deepstream-rtsp-in-rtsp-out) as is, without modifications aside from setting the nvinferserver config to one that uses grpc and YOLOv4, and yes, that is also the repository I used to compile the post processing plugin for YOLOv4.
Right now I only have a retrained model converted that I cannot publish, but if needed I can convert the publicly available YOLOv4 model and tweak the dockerfile so it’s easier to run the leaky experiment, would that help?
I’ve made a simple script that runs the Triton instance and the deepstream-rtsp-in-rtsp-out app, and a Dockerfile to build an image that runs that script. Due to the model size I couldn’t attach it here, so I made it available on this Google Drive link. Please let me know when it’s downloaded so I can take it down.
I’ve included instructions on how to run it, but it should be enough to just build the image and run a container. The rtsp source needs to be provided, though the app will work with file:// inputs as well, so I suppose a sufficiently long video could be used instead.
I have included here the container memory usage when running the app for 30 minutes with a 1080p 10fps 4000 kb/s h264 input RTSP stream.
While this level of memory increase for one stream is tolerable, we’d like to run quite a few streams constantly on a system that won’t be restarted too often, so a constant memory usage increase would be a problem.