How can I get 65Tflops performance with NVIDIA T4

Description

Hi, NVIDIA T4 datasheet shows that mixed precision can achieve 65 TFlops. I have run YoloV3 on P100 and T4 and both run at almost same speed. How do I get the performance mentioned in 65TFlops?

T4 Datasheet link: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-t4/t4-tensor-core-datasheet-951643.pdf

P100 - ~160fps (FP16)
T4 - ~170 fps (FP16)

Also regarding the speed of network with different precisions, my assumption was that “Single > Mixed > double”.

How is it possible for me to replicate the 65Tflops performance on NVIDIA T4?

Hi,

Any information regarding this issue?

Thanks

Hi @surya.22091994,
TFlops and fps are two different terms, Can you please elaborate more about the test run and how you are calculating TFlops from fps?

Thanks!

Hi @AakankshaS

Thanks for the reply. Following is how I guessed the performance of YOLO on T4

Assumptions:

  1. I am running an FP16 model
  2. Half precision perf(FP16) is greater than Mixed precision (FP16+FP32)
  3. Tflops is directly proportional to the fps (unless there are any other bottlenecks - please mention if u think anything else might be bottleneck)
  4. All the fps values mentioned are with 100% utilization of GPU
  5. We might not be utlizing the tensorcores of T4 ( Could you provide some documentation on how to leverage tensorcore compoutation?)
  6. Performance of T4 in datasheet for mixed precision ( FP16+FP32 ) is 65TFlops (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-t4/t4-tensor-core-datasheet-951643.pdf)
  7. P100 performance is mentioned here for half precision mentioned here as 18.7 Tflops https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf

Speed calculation:
if->
On a GPU which can provide 18.7Tflops of performance YOLO runs at 160fps with 100% GPU utilization
Then
Then on a GPU which can provide 65Tflops of performance YOLO should run at 555 fps with 100% GPU utilization ( With no bttlenecks)

So, I was asking how can I get 555 fps on T4. Could you please throw some light on where I might have gone wrong in case it’s wrong to expect 555 fps on T4?