DeepStream 6.0.1 Triton GRPC memory leak

• Hardware Platform (Jetson / GPU) : GPU
• DeepStream Version: 6.0.1 (nvcr.io/nvidia/deepstream:6.0.1-triton)
• TensorRT Version: 8.0.1
• NVIDIA GPU Driver Version: 470.82.00
• Issue Type: Bug, host memory leak
• How to reproduce the issue ?

I have tested the issue with the RTSP in-out Python sample. The backend is TensorRT. Running the sample with the default inference configuration files does not seem to leak memory, but replacing the nvinferserver configuration with settings that use a standalone Triton server instance and GRPC seems to leak memory consistently, proportionally to the input sources count/resolution/bitrate. The Python process is the one that leaks, so it doesn’t seem like a issue in Triton’s server side. The leak being proportional to the input size might indicate the nvinferserver grpc client isn’t correctly freeing frames, or is buffering indefinitely (does it even buffer/queue?). I have only tested the issue by running the standalone Triton server on the same machine as the DeepStream app, and I used YOLOv4 for the grpc test. Updating Triton server to r22.04 had no effect on the leak.

The Dockerfile and nvinferserver configuration I’ve used are as follows:

FROM nvcr.io/nvidia/deepstream:6.0.1-triton

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Asia/Tokyo

WORKDIR /workspace

RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub

RUN apt-get update && apt-get install -y --no-install-recommends \
    zip \
    python3-gi python3-dev python3-gst-1.0 python-gi-dev git python-dev \
    python3 python3-pip python3.8-dev cmake g++ build-essential libglib2.0-dev \
    libglib2.0-dev-bin python-gi-dev libtool m4 autoconf automake \
    libgirepository1.0-dev \
    tzdata && \
    rm -rf /var/lib/apt/lists/*

RUN git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps
WORKDIR /workspace/deepstream_python_apps
RUN git checkout 9bffad1aea802f6be4419712c0a50f05d6a2d490
RUN git submodule update --init
WORKDIR /workspace/deepstream_python_apps/3rdparty/gst-python/
RUN git config --global http.sslverify false
RUN sh autogen.sh
RUN make
RUN make install
WORKDIR /workspace/deepstream_python_apps/bindings
RUN mkdir build
WORKDIR /workspace/deepstream_python_apps/bindings/build
RUN cmake .. -DPYTHON_MAJOR_VERSION=3 -DPYTHON_MINOR_VERSION=8
RUN make
RUN pip3 install pyds-1.1.1-py3-none-linux_x86_64.whl

WORKDIR /workspace
RUN git clone https://github.com/NVIDIA-AI-IOT/yolov4_deepstream.git
RUN cp -r yolov4_deepstream/deepstream_yolov4 /opt/nvidia/deepstream/deepstream/sources/deepstream_yolov4
ENV CUDA_VER=11.4


WORKDIR /opt/nvidia/deepstream/deepstream/sources/deepstream_yolov4/nvdsinfer_custom_impl_Yolo
RUN make

infer_config {
    unique_id: 5
    gpu_ids: [0]
    max_batch_size: 1
    backend {
        triton {

            model_name: "main_model"
            version: -1
            grpc {
                url: "localhost:8001"
            }
        }
    }

    preprocess {
        network_format: IMAGE_FORMAT_RGB
        tensor_order: TENSOR_ORDER_LINEAR
        maintain_aspect_ratio: 0
        normalize {
            scale_factor: 0.003921569
            channel_offsets: [0, 0, 0]
        }
    }

    postprocess {
        labelfile_path: "labels.txt"
        detection {
            num_detected_classes: 80
            custom_parse_bbox_func: "NvDsInferParseCustomYoloV4"
            nms {
                confidence_threshold: 0.3
                iou_threshold: 0.5
                topk: 100
            }
        }
    }

    extra {
        copy_input_to_host_buffers: false
    }

    custom_lib {
        path: "/opt/nvidia/deepstream/deepstream/sources/deepstream_yolov4/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so"
    }
}

input_control {
    process_mode: PROCESS_MODE_FULL_FRAME
    interval: 0
}

output_control {
    output_tensor_meta: true
}

Hi @g-ogawa
do you mean you have an DS python application which are based on

and the nvinferserver config?

From your dockerfile, seems it’s not enough to get these setup, could you share a complete setup?

Thanks!

Hello @mchi, thanks for the response.

While my final application is different, I have observed the leak using the linked application (deepstream-rtsp-in-rtsp-out) as is, without modifications aside from setting the nvinferserver config to one that uses grpc and YOLOv4, and yes, that is also the repository I used to compile the post processing plugin for YOLOv4.

Right now I only have a retrained model converted that I cannot publish, but if needed I can convert the publicly available YOLOv4 model and tweak the dockerfile so it’s easier to run the leaky experiment, would that help?

yes, any repo of the memory leak you found is helpful.

Thanks!

I’ve made a simple script that runs the Triton instance and the deepstream-rtsp-in-rtsp-out app, and a Dockerfile to build an image that runs that script. Due to the model size I couldn’t attach it here, so I made it available on this Google Drive link. Please let me know when it’s downloaded so I can take it down.

I’ve included instructions on how to run it, but it should be enough to just build the image and run a container. The rtsp source needs to be provided, though the app will work with file:// inputs as well, so I suppose a sufficiently long video could be used instead.

I have included here the container memory usage when running the app for 30 minutes with a 1080p 10fps 4000 kb/s h264 input RTSP stream.

While this level of memory increase for one stream is tolerable, we’d like to run quite a few streams constantly on a system that won’t be restarted too often, so a constant memory usage increase would be a problem.

Got issue when tritonserver loading your model,
can you run the model properly?

Source is file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
Starting Triton…
I0519 11:24:34.845895 1368 metrics.cc:290] Collecting metrics for GPU 0: Tesla T4
I0519 11:24:34.846343 1368 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0519 11:24:35.197487 1368 libtorch.cc:1029] TRITONBACKEND_Initialize: pytorch
I0519 11:24:35.197515 1368 libtorch.cc:1039] Triton TRITONBACKEND API version: 1.4
I0519 11:24:35.197521 1368 libtorch.cc:1045] ‘pytorch’ TRITONBACKEND API version: 1.4
I0519 11:24:35.197593 1368 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so
2022-05-19 20:24:35.355798: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0519 11:24:35.396670 1368 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I0519 11:24:35.396726 1368 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I0519 11:24:35.396742 1368 tensorflow.cc:2185] ‘tensorflow’ TRITONBACKEND API version: 1.4
I0519 11:24:35.396763 1368 tensorflow.cc:2209] backend configuration:
{}
I0519 11:24:35.396845 1368 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
I0519 11:24:35.399432 1368 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I0519 11:24:35.399450 1368 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I0519 11:24:35.399455 1368 onnxruntime.cc:1986] ‘onnxruntime’ TRITONBACKEND API version: 1.4
I0519 11:24:35.409558 1368 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/openvino/libtriton_openvino.so
I0519 11:24:35.423172 1368 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I0519 11:24:35.423191 1368 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I0519 11:24:35.423197 1368 openvino.cc:1209] ‘openvino’ TRITONBACKEND API version: 1.4
I0519 11:24:35.854411 1368 pinned_memory_manager.cc:240] Pinned memory pool is created at ‘0x7ffa70000000’ with size 268435456
I0519 11:24:35.856374 1368 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0519 11:24:35.858400 1368 backend_factory.h:45] Create TritonBackendFactory
I0519 11:24:35.858421 1368 plan_backend_factory.cc:49] Create PlanBackendFactory
I0519 11:24:35.858427 1368 plan_backend_factory.cc:56] Registering TensorRT Plugins
I0519 11:24:35.858476 1368 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0519 11:24:35.858494 1368 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I0519 11:24:35.858508 1368 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0519 11:24:35.858524 1368 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0519 11:24:35.858534 1368 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0519 11:24:35.858554 1368 logging.cc:52] Registered plugin creator - ::Clip_TRT version 1
I0519 11:24:35.858564 1368 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0519 11:24:35.858575 1368 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0519 11:24:35.858586 1368 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0519 11:24:35.858611 1368 logging.cc:52] Registered plugin creator - ::ScatterND version 1
I0519 11:24:35.858625 1368 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0519 11:24:35.858636 1368 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0519 11:24:35.858645 1368 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I0519 11:24:35.858672 1368 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0519 11:24:35.858689 1368 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0519 11:24:35.858704 1368 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0519 11:24:35.858715 1368 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
I0519 11:24:35.858739 1368 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1
I0519 11:24:35.858752 1368 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0519 11:24:35.858762 1368 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0519 11:24:35.858785 1368 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0519 11:24:35.858800 1368 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0519 11:24:35.858814 1368 logging.cc:52] Registered plugin creator - ::Split version 1
I0519 11:24:35.858831 1368 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0519 11:24:35.858851 1368 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0519 11:24:35.858863 1368 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I0519 11:24:35.859050 1368 autofill.cc:138] TensorFlow SavedModel autofill: Internal: unable to autofill for ‘1’ due to no version directories
I0519 11:24:35.859078 1368 autofill.cc:151] TensorFlow GraphDef autofill: Internal: unable to autofill for ‘1’ due to no version directories
I0519 11:24:35.859101 1368 autofill.cc:164] PyTorch autofill: Internal: unable to autofill for ‘1’ due to no version directories
I0519 11:24:35.859129 1368 autofill.cc:196] ONNX autofill: Internal: unable to autofill for ‘1’ due to no version directories
I0519 11:24:35.859153 1368 autofill.cc:209] TensorRT autofill: Internal: unable to autofill for ‘1’ due to no version directories
W0519 11:24:35.859161 1368 autofill.cc:243] Proceeding with simple config for now
I0519 11:24:35.859170 1368 model_config_utils.cc:637] autofilled config: name: “1”

E0519 11:24:35.859913 1368 model_repository_manager.cc:1919] Poll failed for model directory ‘1’: unexpected platform type for 1
I0519 11:24:35.859953 1368 server.cc:504]
±-----------------±-----+
| Repository Agent | Path |
±-----------------±-----+
±-----------------±-----+

I0519 11:24:35.860049 1368 server.cc:543]
±------------±----------------------------------------------------------------±-------+
| Backend | Path | Config |
±------------±----------------------------------------------------------------±-------+
| tensorrt | | {} |
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
±------------±----------------------------------------------------------------±-------+

I0519 11:24:35.860065 1368 model_repository_manager.cc:570] BackendStates()
I0519 11:24:35.860083 1368 server.cc:586]
±------±--------±-------+
| Model | Version | Status |
±------±--------±-------+
±------±--------±-------+

I0519 11:24:35.860201 1368 tritonserver.cc:1718]
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.13.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory |
| | cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /workspace/triton_server/main_model/ |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
±---------------------------------±---------------------------------------------------------------------------------------------------------------------------------------+

I0519 11:24:35.860233 1368 server.cc:234] Waiting for in-flight requests to complete.
I0519 11:24:35.860241 1368 model_repository_manager.cc:534] LiveBackendStates()
I0519 11:24:35.860248 1368 server.cc:249] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
I0519 11:24:35.860258 1368 triton_backend_manager.cc:101] unloading backend ‘pytorch’
I0519 11:24:35.860266 1368 triton_backend_manager.cc:101] unloading backend ‘tensorflow’
I0519 11:24:35.860278 1368 triton_backend_manager.cc:101] unloading backend ‘onnxruntime’
I0519 11:24:35.860292 1368 triton_backend_manager.cc:101] unloading backend ‘openvino’
error: creating server: Internal - failed to load all models