I see here:
That it has the following specs:
Single-precision performance 27.8 TFLOPS
RT Core performance 54.2 TFLOPS
Tensor performance 222.2 TFLOPS
I’m trying to work out how to compare cards to each other in order to find the most cost-efficient cards for my application. My application is multiple streams (100’s of cameras) running object detection.
I don’t think we’re interested in RT Cores since they are for raytracing, so we can scrap them.
I think Single-precision is FP32 performance.
Tensor performance confuses me a bit - Given it’s such a big number it might refer to INT4 performance which isn’t applicable to me.
If I have a model on an RTX A5000 that gets e.g. 50fps on single-precision, how do we work out how this model will perform on e.g. an RTX A6000 (single 38.7 TFLOPS, Tensor 309.7)? Does is scale linearly with either spec?
Looking at the GPU Support matrix here: Support Matrix :: NVIDIA Deep Learning TensorRT Documentation and cross referencing against CUDA GPUs - Compute Capability | NVIDIA Developer it seems that the RTX A5000 is capable of half precision fp16 and also INT8 - where would I find the specs for this? A lot of edge devices (some jetsons, Google Coral TPU) have specs in TOPS so it would be good to compare INT8 directly.