How A30 GPU is faster than A10 GPU?

Hi,

I’ve been comparing the specs of A10 vs A30 for AI Inference workflow. I really don’t understand how A30 is faster than A10 on FP16 Tensor Core compute:

A30 has 3,804 CUDA cores and 224 third-gen Tensor Cores.
A10 has 9,216 CUDA cores and 288 third-gen Tensor Cores.

BUT, A30 offers 165 FP16 TC TFLOPS vs A10’s 125 FP16 TC TFLOPS.

Could you explain a bit more of how A30 gains extra performance? I’ve been using the number of cuda cores and tensor cores to estimate inference performance, but it seems I’ve missed something more.

Thanks a lot.