Is it posible to run Triton Server on GPU device and Gstreamer with Nvinterserver in a CPU-Only device?

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.4
• Issue Type( questions, new requirements, bugs) Question
Hi, I am currently running deepstream with nvinfer with success for our application, but we are having GPU quota issues in Azure that limits the amount of T4 Vms we can use. We would like to transition to multiple models and we would like to check whether the following setup is viable:

  • GPU devices: 4 vCPUs servers (with T4 GPU) running only Triton Inference Server with the models attached to a MLOps pipeline.

  • CPU-Only devices: 4 vCPUs (without GPU) running a cpu-only gstreamer pipeline with the “nvinferserver” plugin to make the inference using gRPC Triton calls. All the other parts of the pipeline uses CPU-only alternatives, including tracking.

We would have multiple CPU-only devices sharing the same Triton Server to perform inference on multiple models.

Is that setup viable? Can we run “nvinferserver” plugin on a CPU-only device connect to Triton inference servers using gRPC? If so, how can I compile only the nvinferserver plugin without CUDA access?


“nvinferserver” plugin needs GPU even with grpc mode. It is impossible.

Thanks for the answer, is there any other gstreamer plugin (cpu-only) that can communicate with Triton using gRPC?

I don’t think there is any other GStreamer plugin for such case.

