How to run multi trt model instance in single gpu efficentilly?

I have run 4 trt model instance for inference in single gpu, the utils of it is about 80% and the latency of each model was increasing, I’m curious if trt can isolate each model on gpu for avoid competition??? What else I can do for improving their inference speed ?

1 Like