I have a primary-gie and a secondary-gie for inference so when multiple primary instances were found, the same amount of secondary inference were triggered. As a result, the secondary-gie took significantly more time than the primary-gie even though the secondary-gie utilizes a smaller model.
I tried creating a secondary model with batch-size 2 but it ended up producing very bad results. So how can I improve the latency of secondary-gie?
My setup is the following:
NVIDIA GPU Driver Version 10.2