Performance about nvinfer and nvinferserver

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.0
• TensorRT Version 7.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 460.84
• Issue Type( questions, new requirements, bugs) question

We are using deepstream triton docker 6.0 for the experiment.
We used deepstream_ssd_parser.py to run yolov4 in TensorRT format with Nvinfer and Nvinferserver element and measure the performance difference.
The result shows that Nvinferserver is almost 2x slower than Nvinferser. Is this result reasonable?

Deepstream-app sink : Fakesink
Video : sample_720.h264 nvinfer nvinferserver
Frame counts 1442 1442
fps 178.02 66.36
Python: ssd-parser sink : Fakesink
Video : sample_720.h264 nvinfer nvinferserver
Frame counts 1442 1442
fps 145.5091789 87.31596242

We’re investigating and will have the suggestion soon.

Hi @yamiefun ,
Sorry for delay! What’s your yolov4 model, onnx or tf model?

Thanks!

Hi @mchi ,
We used tensorRT yolov4 with both nvinfer and nvinferserver.

I’m having the same problem (to clarify I’m using remote Triton via gRPC). While nvinfer is able to achieve about 860 infer/sec, nvinferserver with the same model only gets about 120 infer/sec.

Benchmarking Triton with perf_analyzer is able to achieve 860 infer/sec as well (concurrency level 5, CUDA shared memory, gRPC) so I know that Triton is not the bottleneck.

I was expecting nvinferserver to use CUDA shared memory while using remote Triton but that does not seem to be the case as Triton is not showing any registered CUDA memory regions while my pipeline is running.

@mchi can you clarify this? Also, is it possible to make nvinferserver use shared memory?