TensorRT6 benchmarking numbers on Xavier: Batchsize info please?

Nvidia has published benchmark numbers with MLPerf 0.5 here. https://devblogs.nvidia.com/nvidia-mlperf-v05-ai-inference/
I am unable to find the detailed information like:

  1. what was the batchsize used for single stream?
  2. are these GPU only numbers? or (GPU fps + 2x DLA fps concurrently)?
  3. How many streams were running in multistream environment?
  4. Assuming that MaxN and jetson_clocks was activated.
    Will be really nice if I get answers to above mentioned queries.
    Thanks in advance!

Hi BMohit, you can find the code to reproduce the MLPerf Inference 0.5 benchmarks here: https://github.com/mlperf/inference_results_v0.5/tree/master/closed/NVIDIA

Also see here for the official results: https://mlperf.org/inference-results/

The objective of the single-stream scenario is to obtain the lowest latency, so it was batch size 1. Results for this scenario are reported in milliseconds.

GPU + 2x DLA were used concurrently for the multi-stream and offline scenarios. I believe single-stream used GPU only, as one image is processed at a time in that scenario.

You can see the number of streams that ran within the latency constraints reported in the results table for the multi-stream scenario.

MAX-N was used, not sure about jetson_clocks - the benchmarks run for awhile under load, so the clocks would already be spun-up.

Thanks for the useful information!