High Network (LAN) bandwidth required between Deepstream and Triton Server using 2 Physical Hosts

I’m trying to separate Deepstream and Triton Server on different physical hosts.

I’m currently processing around 900 frames/second , frame size 704x480 , model input size 416x416.

Deepstream Config

  backend {
    triton {
      model_name: "yolov7-416"
      version: -1
      grpc {
        url: "0.0.0.0:8001"
      }
    }
  }

My current throughput with above setup is 1.8GBytes/s (20Gbit/s Network Adapter) from Deepstream to Triton Server.
There is any compression support between Deepstream/Triton Server ?
I can’t imagine using a remote Triton Server with 4 models for inference.

Hi @Levi_Pereira ,
are you using nvinferserver of deepstream with remote server?
If you are, it send raw inference data to Triton server in nvinferser, and receive the raw infernece output, then do parser locally, there is not compression solution in it.

If you have concern to the throughput, can you deploy deepsteam on server and send rtsp/encoded video to server with whole deepstream pipeline (rtsp/encoded stream → decoding → batching/nvstreammux → inference → …)

My Setup:

8x GPU A16 in one Physical Host (Deepstream)

  • Decode
  • Tracking
  • Encode

2x GPU RTX 3090 in one Physical Host (Triton Server)

  • Inference

Since RTX 3090 is limited to 3 concurrent session to encode I need to use deepstream on A16 since there is no limit to encode.

The issue is network bandwidth between theses servers must be too high. As gRPC support Compression maybe this can reduce network throughput between deepstream and inference server.

If not possible compression on gRPC, will be good if Nvidia provide a documentation about recommend hardware requeriment to deploy Physical Inference Server (such as InfiniBand). In this way, it is easier to discuss with our customers the recommended hardware to deploy the environment.

Hi @Levi_Pereira ,
Sorry for delay! Understood your requirement.

Since RTX 3090 is limited to 3 concurrent session to encode I need to use deepstream on A16 since there is no limit to encode.

Some CPU also has powerful encoding capability, you may could try SW encoding based on CPU.