I am running a TensorRT model on Tesla M60 cards (Amazon g3.16xlarge instance).
And I am experiencing weird TRTIS behavior. The model, that runs on 4 GPUs has only 25% FPS improvement over the one, that runs on 2 GPUs.
Does anyone know, what could be a bottleneck here?
I checked the same model on GTX 1070, and the performance doubles when doubling the number of GPUs.