Benchmarking for batch sizes 64 and 128

manogna63 · December 9, 2019, 9:22am

In the NVIDIA benchmarks, https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks
Can you explain how the models are benchmarked for batch sizes 64 and 128?

As DLA doesn’t support batchsizes>32, what is the method followed?

dusty_nv · December 9, 2019, 3:32pm

Hi manogna63, for the higher batch sizes, DLA is run at the maximum batch size it supports for the given network. GPU can typically run the higher batch sizes. The results are then aggregated into the throughput across the device from running GPU + 2xDLA’s concurrently.