Can an A100 GPU handle 8 parallel YOLOv11-L real-time inference streams at 1080p 30FPS?

Hello,

I am working on a real-time object detection project in my company, and we plan to use YOLO for inference on multiple IP cameras. Currently, we do not have an on-premise GPU server, so we are considering purchasing a system equipped with an NVIDIA A100 GPU with this, I’m going to run the llm model in the future.

Our goal is to run 8 separate IP camera streams, each at:

  • Resolution: 1920 × 1080

  • Target inference speed: 30 FPS

  • Model: YOLOv11-L (Ultralytics)

Before making a purchase, I would like to confirm the following:

1. Can a single A100 GPU run 8 YOLOv11-L models in parallel at 1080p 30FPS?

If anyone has benchmark experience with YOLOv8/L11-L on A100, I would appreciate performance guidance—especially regarding GPU throughput, memory usage, and expected real-time stability across 8 streams.

2. If the A100 cannot reliably support this load, what edge system would you recommend?

We are open to alternatives such as:

  • Jetson AGX Orin industrial edge PCs

  • x86-based industrial computers with RTX A2000/A4000

  • Any proven hardware configuration capable of handling 8 channels of 1080p YOLO inference at 30FPS

If possible, we prefer a design suitable for factory environments.

Any recommendations, benchmarks, or architectural guidance would be extremely helpful.

Thank you in advance for your support.

What’s the input size of the model used during inference?

In my current implementation, I do not manually set the input tensor size.
The raw 1920×1080 frame is passed directly to the YOLO detector, and the model internally resizes it to its default inference size (typically 640×640 for Ultralytics YOLO).