Benchmarking for batch sizes 64 and 128

In the NVIDIA benchmarks, https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks
Can you explain how the models are benchmarked for batch sizes 64 and 128?

As DLA doesn’t support batchsizes>32, what is the method followed?

Hi manogna63, for the higher batch sizes, DLA is run at the maximum batch size it supports for the given network. GPU can typically run the higher batch sizes. The results are then aggregated into the throughput across the device from running GPU + 2xDLA’s concurrently.