Ideas to maximize throughput using TensorRT

Hello everyone,

Description

I’m wondering if running multiple instance of TensorRT model, would help increase the inference throughput?

If you have any other ideas to maximize the throughput or minimize the latency for one batch, I’m also open to them!

Constraints

  • I’m open to use any kind of GPU, technical solutions.
  • The batch size must be 1.

Many thanks,
Vincent

Hi @vincent-974,
The below link should be able to help you to get improved performance.

Thanks!