Which is faster DeepStream? or TensorRT Python/C++ API? or TRTIS?

Hello all!

For an end-to-end application(read image file -> pre-process -> Inference using network -> post-process)(not displaying output), Is it generally suggested(considering only FPS) to use DeepStream? or TensorRT Python/C++ API?

  1. Is DeepStream C++ API faster than TensorRT C++ API, for the above pipeline?
  2. When do we construct the entire pipeline using TensorRT C++ API & when is DS used?

Thanks!

Please check diagram in https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Overview.html#deepstream-graph-architecture .

TRT is just for DNN(s) part, the other parts, e.g. image read, conversion, osd, render, etc are only provided by DeepStream.

Thanks for the reply @mchi

I actually wanted to ask the following:

  1. Talking exclusively about the DNN part, is the same done faster by DS? or by TRT C++ API?

  2. If TRT C++API does DNN part faster, will this speed improvement outweigh the speed improvement provided by DS from other parts like img read, pre-processing, post-processing?

Apologies if I wasn’t clear before.

P.S:
In the first post, By TRT C++ API, I meant DNN part using TRT C++ API & the rest using a different C++ code.

Thanks!

Talking exclusively about the DNN part, is the same done faster by DS? or by TRT C++ API?

For DNN itself, DS calls the TRT API to do inference, so they have the same perf.

But, for whole pipeline, DS has many optimization for DNN(s), such as, reducing the memory to save memory traffic which is helpful for improving the DNN perf, getting several DNN(s) running in parallel

Thank you!

Also, for the DNN part, how is the comparison between TRTIS & DS?

I’ll change the title of the question accordingly.

Thanks

TRTIS suopports differrent model formarts, e.g. tf model, pytorch model, etc directly, since TRTIS integrate these frameworks as its backend.

DS integrates TRTIS (Triton) as gst-nvinferserver plugin

  1. Is ‘The support of other frameworks like TF, PyTorch, etc’ the ONLY reason why one might want to use TRTIS?

  2. Also, is it the case that the DNN part’s speed of ‘DS with nvinfer’ will be faster than TRTIS?(since DS has TensorRT backend whereas TRTIS doesn’t(Correct me if this assumption is wrong))?

Is ‘The support of other frameworks like TF, PyTorch, etc’ the ONLY reason why one might want to use TRTIS?

It’s one major reason, but not sure if it’s ONLY.

Also, is it the case that the DNN part’s speed of ‘DS with nvinfer’ will be faster than TRTIS?

I think so. TRT is very optmized.

Thanks for the info!

One last query:
It is written on one of nvidia’s website that TRTIS supports TensorRT as well.

  1. The same website says “It runs models concurrently on GPUs maximizing utilization”. Can you please expand this statement?

  2. I’m developing an application for object detection using yolo. I’m just trying to evaluate which of TRTIS / DS will be faster for my application. Any comments on that?

Thanks for your time!

The same website says “It runs models concurrently on GPUs maximizing utilization”. Can you please expand this statement?

yes, you can refer to Triton doc - https://github.com/triton-inference-server/server/blob/r20.11/docs/architecture.md#concurrent-model-execution

I’m developing an application for object detection using yolo. I’m just trying to evaluate which of TRTIS / DS will be faster for my application. Any comments on that?

About perf, I have explained above. You can also benchmark it with TRTIS/DS respectively.

1 Like