details:
pc@pc-desktop:~/mlperf/inference_results_v2.0/closed/NVIDIA$
make run_harness RUN_ARGS=“–benchmarks=resnet50 --scenarios=offline --test_mode=PerformanceOnly”
[2022-05-01 17:59:18,900 main.py:770 INFO] Detected System ID: KnownSystem.Orin
[2022-05-01 17:59:19,429 main.py:249 INFO] Running harness for resnet50 benchmark in Offline scenario…
[2022-05-01 17:59:19,437 init.py:43 INFO] Running command: ./build/bin/harness_default --logfile_outdir=“/home/pc/mlperf/inference_results_v2.0/closed/NVIDIA/build/logs/2022.05.01-17.59.16/Orin_TRT/resnet50/Offline” --logfile_prefix=“mlperf_log_” --performance_sample_count=2048 --test_mode=“PerformanceOnly” --dla_batch_size=8 --dla_copy_streams=2 --dla_inference_streams=1 --gpu_copy_streams=2 --gpu_inference_streams=1 --use_direct_host_access=true --gpu_batch_size=256 --map_path=“data_maps/imagenet/val_map.txt” --tensor_path=“build/preprocessed_data/imagenet/ResNet50/int8_linear” --use_graphs=false --gpu_engines=“./build/engines/Orin/resnet50/Offline/resnet50-Offline-gpu-b256-int8.lwis_k_99_MaxP.plan” --mlperf_conf_path=“measurements/Orin_TRT/resnet50/Offline/mlperf.conf” --user_conf_path=“measurements/Orin_TRT/resnet50/Offline/user.conf” --dla_engines=“./build/engines/Orin/resnet50/Offline/resnet50-Offline-dla-b8-int8.lwis_k_99_MaxP.plan” --scenario Offline --model resnet50
[2022-05-01 17:59:19,437 init.py:50 INFO] Overriding Environment
benchmark : Benchmark.ResNet50
dla_batch_size : 8
dla_copy_streams : 2
dla_core : 0
dla_inference_streams : 1
gpu_batch_size : 256
gpu_copy_streams : 2
gpu_inference_streams : 1
input_dtype : int8
input_format : linear
map_path : data_maps/imagenet/val_map.txt
offline_expected_qps : 5700
precision : int8
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name=‘ARMv8 Processor rev 1 (v8l)’, architecture=<CPUArchitecture.aarch64: AliasedName(name=‘aarch64’, aliases=(), patterns=())>, core_count=4, threads_per_core=1): 3}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=31.357616, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=31357616000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class ‘int’>, {GPU(name=‘NVIDIA Orin Jetson-Small Developer Kit’, accelerator_type=<AcceleratorType.Integrated: AliasedName(name=‘Integrated’, aliases=(), patterns=())>, vram=None, max_power_limit=None, pci_id=None, compute_sm=87): 1})), numa_conf=None, system_id=‘Orin’)
tensor_path : build/preprocessed_data/imagenet/ResNet50/int8_linear
use_direct_host_access : True
use_graphs : False
config_name : Orin_resnet50_Offline
config_ver : lwis_k_99_MaxP
accuracy_level : 99%
optimization_level : plugin-enabled
inference_server : lwis
system_id : Orin
use_cpu : False
use_inferentia : False
soc_gpu_freq : None
soc_dla_freq : None
soc_cpu_freq : None
soc_emc_freq : None
orin_num_cores : None
test_mode : PerformanceOnly
openvino_version : f2f281e6
gpu_num_bundles : 2
log_dir : /home/pc/mlperf/inference_results_v2.0/closed/NVIDIA/build/logs/2022.05.01-17.59.16
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: measurements/Orin_TRT/resnet50/Offline/mlperf.conf
[I] user.conf path: measurements/Orin_TRT/resnet50/Offline/user.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] [MemUsageChange] Init CUDA: CPU +283, GPU +0, now: CPU 325, GPU 7884 (MiB)
[I] [TRT] Loaded engine size: 26 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +825, now: CPU 908, GPU 8778 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +84, GPU +134, now: CPU 992, GPU 8912 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB)
[I] Device:0: ./build/engines/Orin/resnet50/Offline/resnet50-Offline-gpu-b256-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 25 MiB
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +25, GPU +0, now: CPU 25, GPU 24 (MiB)
[I] Device:0.DLA-0: ./build/engines/Orin/resnet50/Offline/resnet50-Offline-dla-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 25 MiB
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +26, GPU +0, now: CPU 51, GPU 24 (MiB)
[I] Device:0.DLA-1: ./build/engines/Orin/resnet50/Offline/resnet50-Offline-dla-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1018, GPU 8958 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +4, now: CPU 1018, GPU 8962 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +392, now: CPU 51, GPU 416 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +7, now: CPU 1030, GPU 9415 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1030, GPU 9425 (MiB)
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +392, now: CPU 51, GPU 808 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 51, GPU 810 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 51, GPU 811 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 51, GPU 813 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 51, GPU 814 (MiB)
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Finished setting up SUT.
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.63439s.
Starting running actual test.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : Offline
Mode : PerformanceOnly
Samples per second: 4719.53
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: Yes
================================================
Additional Stats
================================================
Min latency (ns) : 63788073
Max latency (ns) : 797114066115
Mean latency (ns) : 398299385560
50.00 percentile latency (ns) : 398102674467
90.00 percentile latency (ns) : 717305385860
95.00 percentile latency (ns) : 757208441537
97.00 percentile latency (ns) : 773202000660
99.00 percentile latency (ns) : 789172108789
99.90 percentile latency (ns) : 796318428604
================================================
Test Parameters Used
================================================
samples_per_query : 3762000
target_qps : 5700
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 600000
max_duration (ms): 0
min_query_count : 1
max_query_count : 0
qsl_rng_seed : 6655344265603136530
sample_index_rng_seed : 15863379492028895792
schedule_rng_seed : 12662793979680847247
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 2048
No warnings encountered during test.
No errors encountered during test.
Finished running actual test.
Device Device:0 processed:
11952 batches of size 256
Memcpy Calls: 373504
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 0
Device Device:0.DLA-0 processed:
43913 batches of size 8
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 0
Device Device:0.DLA-1 processed:
43873 batches of size 8
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 0
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2022-05-01 18:12:46,151 main.py:304 INFO] Result: result_samples_per_second: 4719.53, Result is VALID
======================= Perf harness results: =======================
Orin_TRT-lwis_k_99_MaxP-Offline:
resnet50: result_samples_per_second: 4719.53, Result is VALID
======================= Accuracy results: =======================
Orin_TRT-lwis_k_99_MaxP-Offline:
resnet50: No accuracy results in PerformanceOnly mode.