How to understand Inference performance benchmarks

Hi all,

Can someone please help elaborate the following benchmarks and relations between them?

  1. Inferences per second (with given data type and NN type) — Does this “inference” refer to the complete process from input to output of DNN. E.g. for ResNet50, given input as images of “CATs”, does this refer to the # of CAT images can be classified through NN?

  2. Images per second — for CNN, how does this relate to Inferences per second in 1 above?

  3. Latency — Does this refer to the end-to-end delay in time from system level? And can I assume it should always be > (1/(infs/sec))?

thanks a lot

  1. Yes, although be aware that batching affects this. It refers to the time taken to present an input and acquire the output from the NN. If batching is in effect, then the time is for a batch, and the images per second time is scaled from that by the number of images in the batch.

  2. I’m not aware that CNN should make any difference. Many of the reported benchmarks (e.g. ResNet50) are for CNN (resnet 50 is a CNN network).

  3. Latency is just the time from presentation of input to acquisition of output. It doesn’t presume anything about “system level”, just the application under test (e.g. TF, TRT, or whatever).