Nvidia has published benchmark numbers with MLPerf 0.5 here. https://devblogs.nvidia.com/nvidia-mlperf-v05-ai-inference/
I am unable to find the detailed information like:
- what was the batchsize used for single stream?
- are these GPU only numbers? or (GPU fps + 2x DLA fps concurrently)?
- How many streams were running in multistream environment?
- Assuming that MaxN and jetson_clocks was activated.
Will be really nice if I get answers to above mentioned queries.
Thanks in advance!
Hi BMohit, you can find the code to reproduce the MLPerf Inference 0.5 benchmarks here: https://github.com/mlperf/inference_results_v0.5/tree/master/closed/NVIDIA
Also see here for the official results: https://mlperf.org/inference-results/
The objective of the single-stream scenario is to obtain the lowest latency, so it was batch size 1. Results for this scenario are reported in milliseconds.
GPU + 2x DLA were used concurrently for the multi-stream and offline scenarios. I believe single-stream used GPU only, as one image is processed at a time in that scenario.
You can see the number of streams that ran within the latency constraints reported in the results table for the multi-stream scenario.
MAX-N was used, not sure about jetson_clocks - the benchmarks run for awhile under load, so the clocks would already be spun-up.
Thanks for the useful information!