Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
NVIDIA RTX 2060 • DeepStream Version
6.2 (from the Docker Deepstream 6.2 Image) • JetPack Version (valid for Jetson only) • TensorRT Version
8.5.2.2 • NVIDIA GPU Driver Version (valid for GPU only)
525.105.17 • Issue Type( questions, new requirements, bugs)
We are currently migrating our models from NvInfer plugin to NvInferServer plugin (for some specific data processing using Python backend). We have noticed that the GPU memory consumption between NvInfer vs NvInferServer is not consistent, even though we are using the same TensorRT model and similar configuration (same batch size = 30, same image preprocessing, same custom lib, same input dims=3x768x768).
To be more specific, when we use NvInfer plugin to load our model, our memory consumption is about 1194 MB. When we switch to NvInferServer for the same model, our memory consumption increased (2124 MB, about double the memory usage), even though our configuration is similar. The image attached below shows the difference between the plugin usage.
Is there different behaviour in how NvInfer works compared to how the Triton Server loads the TensorRT backend that would incur different memory usage?
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
We are using a typical ONNX base model that we cannot disclose, but any ONNX-based model should work. We then convert the model using NvInfer built-in conversion when we run the pipeline. After that, we copied the generated engine to the Triton Server model repository and used NvInferServer to call to the TensorRT model. The attached configuration file include all 3 needed configs to reproduce how we ran the model in the pipeline (sorry for putting everything in 1 file, new user only get to upload 1 link): pgie_configs.txt (2.4 KB)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
gst-nvinferserver is based on the Triton Inference Server while gst-nvinfer is based on the TensorRT. Triton Inference Server and TensorRT are different SDKs with different architectures.
They are from different architecture, there might be some slight difference.
For your case, please check:
make sure gst-nvinfer and gst-nvinferserver point to the same TRT engine file. gst-nvinfer may rebuild the model online and gst-nvinferserver need the model pre-built offline. If they are not pointing to same engine file. perf/utils are different and not comparable. Make sure the engine is exactly with same batch-size/precision. e.g.
Triton will also reserve some small mem for performance by default. If your model don’t need it, you can reduce or disable them or set to 0. but this might affect your perf
we can’t reproduce this issue by deeepstream-test1 with resnet10 or yolov4. if seting batch30 and using the same engine, the GPU memory usage differnce is not too much, far less than 1G.
can you use deeepstream-test1 and a public model to reproduce this issue? if you can still reprodcue, please provoide the application, model and whole configuration.
Hey there, sorry for the late reply, we forgot to check for further replies after a while. We have reproduced the problem using the resnet10 model in the DeepStream SDK and it seems the issue scales with the batch size used in the configuration.
Our steps in reproducing the issue using a public model: we used the ResNet10 Caffe Model (PrimaryDetector) provided in the DeepStream 6.2 Docker container and the deepstream-test1 for reproduction.
First we set the model batch size to 128 and let the plugin generate the model engine, then ran the pipeline.
We then copied the engine file to the PrimaryDetector folder in triton_model_repo and set the configuration to point to the engine file, with the batch size set at 128. We then ran the pipeline using nvinferserver configuration.
The image attached below shows the difference between using nvinfer and nvinferserver: GPU usage difference is almost 1G.
Since we are in need of using models with high batch sizes but are restrained by memory limit, we will stick to the NvInfer plugin for now. Thank you for the instructions.