I have an application that will be doing live stream decoding for 2000 live sources and from 3000 to 4000 infr/sec on mixed precision with Yolov5m and I have been comparing GPUs to buy the best group of GPUs to achieve this task.
Currently, I have got RTX A4000 GPU and it is giving me 482 inf/sec on tensorrt triton for one GPU that means I need about 9 RTX A4000 to achieve the wanted inference rate.
I looked at A30 Tensor Core GPU Specs and I see it doesn’t do much more than RTX A4000 in terms of the number of CUDA Cores, Tensor Cores and the FP32 Performance but I don’t have an access to A30 to evaluate it’s performance.
So, could you please explain more about the difference in performance between the 2 GPUs and if I want to use A30 how many GPUs should I get to achieve the performance that I mentioned and if you have any hardware recommendations.
Thanks in advance.