Hi all,
Can someone please help elaborate the following benchmarks and relations between them?
-
Inferences per second (with given data type and NN type) — Does this “inference” refer to the complete process from input to output of DNN. E.g. for ResNet50, given input as images of “CATs”, does this refer to the # of CAT images can be classified through NN?
-
Images per second — for CNN, how does this relate to Inferences per second in 1 above?
-
Latency — Does this refer to the end-to-end delay in time from system level? And can I assume it should always be > (1/(infs/sec))?
thanks a lot