Performance data (latency) for VGG16 layer-by-layer inference on T4


I am looking for published performance data (latency in mili-seconds) for Tesla T4 inference processing with a VGG16 CNN network.
Specifically, layer-by-layer latency when executing inference with the VGG16 model, using the ImageNet dataset ( or other similar dataset ).

I am looking for latency data (start of inference processing by a T4 layer to end of processing by the same layer) listed for each layer : for example

CONV1 layer - x1 mili-sec
CONV2 layer - x2 mili-sec

Fully_connected FC8 layer - y_fc8 mili-sec
Fully_connected FC7 layer - y_fc7 mili-sec
Fully_connected FC6 layer - y_fc6 mili-sec

these are the layers I’m interested in. I have a VLSI hardware background and I’m familiar with (multi-cycle) hardware pipeline stages, with start/done processing flags per stage; these start/done flags allow for easy and accurate hardware latency measurements per stage. Intuitively, similar start/done flags for each CNN layer can be used to profile inferencing latency per layer, Perhaps the T4 has such start/done flags and they have been used by software applications to extract layer-by-layer inference latency ?

I’m aware of these benchmarks :

Edge TPU performance benchmarks | Coral

for a VGG16 model, but they list inference processing latency for the entire VGG16 model, and don’t have a layer-by-layer breakdown of the processing latency.

thank you,
Nick Iliev, Ph.D.
Research Associate