Hi, NVIDIA T4 datasheet shows that mixed precision can achieve 65 TFlops. I have run YoloV3 on P100 and T4 and both run at almost same speed. How do I get the performance mentioned in 65TFlops?
P100 - ~160fps (FP16)
T4 - ~170 fps (FP16)
Also regarding the speed of network with different precisions, my assumption was that “Single > Mixed > double”.
How is it possible for me to replicate the 65Tflops performance on NVIDIA T4?