My current throughput with above setup is 1.8GBytes/s (20Gbit/s Network Adapter) from Deepstream to Triton Server.
There is any compression support between Deepstream/Triton Server ?
I can’t imagine using a remote Triton Server with 4 models for inference.
Hi @Levi_Pereira ,
are you using nvinferserver of deepstream with remote server?
If you are, it send raw inference data to Triton server in nvinferser, and receive the raw infernece output, then do parser locally, there is not compression solution in it.
If you have concern to the throughput, can you deploy deepsteam on server and send rtsp/encoded video to server with whole deepstream pipeline (rtsp/encoded stream → decoding → batching/nvstreammux → inference → …)
2x GPU RTX 3090 in one Physical Host (Triton Server)
Inference
Since RTX 3090 is limited to 3 concurrent session to encode I need to use deepstream on A16 since there is no limit to encode.
The issue is network bandwidth between theses servers must be too high. As gRPC support Compression maybe this can reduce network throughput between deepstream and inference server.
If not possible compression on gRPC, will be good if Nvidia provide a documentation about recommend hardware requeriment to deploy Physical Inference Server (such as InfiniBand). In this way, it is easier to discuss with our customers the recommended hardware to deploy the environment.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
Hi @Levi_Pereira ,
Sorry for delay! Understood your requirement.
Since RTX 3090 is limited to 3 concurrent session to encode I need to use deepstream on A16 since there is no limit to encode.
Some CPU also has powerful encoding capability, you may could try SW encoding based on CPU.