• Hardware Platform - RTX 2080 • DeepStream Version - 5.0 • NVIDIA GPU Driver Version - 440.33.01 Hi, I have been trying to run a personalized model for face detection, which produces output tensor of shape [25270, 6]. The number of rows is for different ROIs, and the 6 values are for bbox(first…

@duttaneil16 Sorry for the long wait. Since this issue is a little complicated, we have make some discussions and here you are some conclusions from us. When running TensorFlow models using Triton Inference Server, the GPU device memory may fall short. The allowed GPU device memory allocation fo…

@duttaneil16 It seems this issue is about nvinfer, not nvinferserver. But anyway, I think we require detailed information about your DS and nvinfer setups, and your face detection model, so that we can setup a similar environment to reproduce your problems. Do you mind sharing your working direct…

Hi @ersheng , I have made a custom pipeline where I have replaced the nvinfer plugin with nvinferserver from deepstream-test1. It is inside a folder named fd_tri_v which contains the config file, nvdsinfer_custom_impl_Yolo folder for output parsing function and the deepstream app file. The config f…

@duttaneil16 What format of Yolo model are you doing with? tensorflow? tensorRT or Caffe?

Hi @ersheng , I am using a tensorflow graphdef format.

Hi @ersheng I tuned the parameter tf_gpu_memory_fraction to smaller fractions to test the throughput, which did not affect the throughput much. I will have to test some other models where this memory issue was happening. I had earlier tried TF-TRT conversion but failed, due to some unsupported lay…

@duttaneil16 Have you tried this? https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel

@duttaneil16 Please open a new forum topic for your new question so that we can easily trace these topics. Thanks.

Model run using nvinferserver occupying high GPU memory-usage

Accelerated Computing Intelligent Video Analytics DeepStream SDK

duttaneil16 August 5, 2020, 6:37pm 10

Hi @ersheng,
I had tried the suggested tf-trt guide but it get trt_engine_opts as 0. So I tried the above suggested changes to config.pbtxt for optimization which converted portion of the graphs to trt_engines.
I further had trouble running the converted on-the-fly model with changed config which I have created a topic for, here is the reference:

Basically the parameter to be given for Tf-trt conversion or the model itself seems to be an issue.

Topic		Replies	Views
Nvinferserver DeepStream SDK	13	1659	October 12, 2021
TensorRT model memory usage in NvInfer vs NvInferserver plugin DeepStream SDK tensorrt , nvbugs	5	701	July 10, 2023
Memory deepstream triton DeepStream SDK deepstream61	6	633	August 31, 2023
Tactic Device request: 250MB Available: 6MB. Device memory is insufficient to use tactic DeepStream SDK deepstream	5	71	April 8, 2026
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server Technical Blog	13	1391	May 25, 2022
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	3091	September 2, 2022
Excess Memory-copy in standalone triton server. Deepstream--Triton server(grpc) DeepStream SDK deepstream	14	97	January 12, 2026
Custom Detection parser error with nvinferserver and custom python model with > 1 streams DeepStream SDK inference-server-triton , gpu , deepstream	18	1427	September 4, 2023
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server DeepStream SDK	3	8981	February 29, 2024
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1388	October 19, 2023

Model run using nvinferserver occupying high GPU memory-usage

Related topics