Fail to run mlperf inference Resnet50 benchmars on docker

Hi all,
I am newbie to the NVIDIA AI world. I want to check the mlperf inference resnet50 benchmark result on my GPU A30.
Follow below user guide to setup SUT for running resnet50 benchmark.

However I get RuntimeError(“Building engines failed!”) and I have no idea how to fix it. can anyone give me some suggestion. Thanks
Below is the output.

(mlperf) test@mlperf-inference-test-x86_64:/work$ make run RUN_ARGS=“–benchmarks=resnet50 --scenarios=offline”
make[1]: Entering directory ‘/work’
[2023-05-12 05:35:37,521 main_v2.py:221 INFO] Detected system ID: KnownSystem.AMD_K905
[2023-05-12 05:35:39,726 generate_engines.py:172 INFO] Building engines for resnet50 benchmark in Offline scenario…
[2023-05-12 05:35:39,752 ResNet50.py:36 INFO] Using workspace size: 0
[05/12/2023-05:35:40] [TRT] [I] [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 354, GPU 642 (MiB)
[05/12/2023-05:35:41] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +414, GPU +132, now: CPU 787, GPU 774 (MiB)
Process Process-1:
Traceback (most recent call last):

  • File “/usr/lib/python3.8/multiprocessing/process.py”, line 315, in _bootstrap*
  • self.run()*
  • File “/usr/lib/python3.8/multiprocessing/process.py”, line 108, in run*
  • self._target(*self._args, *self._kwargs)
  • File “/work/code/actionhandler/base.py”, line 185, in subprocess_target*
  • return self.action_handler.handle()*
  • File “/work/code/actionhandler/generate_engines.py”, line 175, in handle*
  • total_engine_build_time += self.build_engine(job)*
  • File “/work/code/actionhandler/generate_engines.py”, line 166, in build_engine*
  • builder.build_engines()*
  • File “/work/code/common/builder.py”, line 170, in build_engines*
  • self.initialize()*
  • File “/work/code/resnet50/tensorrt/ResNet50.py”, line 78, in initialize*
  • rn50_gs = RN50GraphSurgeon(self.model_path,*
  • File “/work/code/resnet50/tensorrt/rn50_graphsurgeon.py”, line 328, in init*
  • if os.path.exists(self.cache_file):*
  • File “/usr/lib/python3.8/genericpath.py”, line 19, in exists*
  • os.stat(path)*
    TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
    [2023-05-12 05:35:43,882 generate_engines.py:172 INFO] Building engines for resnet50 benchmark in Offline scenario…
    [2023-05-12 05:35:43,899 ResNet50.py:36 INFO] Using workspace size: 0
    [05/12/2023-05:35:44] [TRT] [I] [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 354, GPU 642 (MiB)
    [05/12/2023-05:35:45] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +414, GPU +132, now: CPU 787, GPU 774 (MiB)
    Process Process-2:
    Traceback (most recent call last):
  • File “/usr/lib/python3.8/multiprocessing/process.py”, line 315, in _bootstrap*
  • self.run()*
  • File “/usr/lib/python3.8/multiprocessing/process.py”, line 108, in run*
  • self._target(*self._args, *self._kwargs)
  • File “/work/code/actionhandler/base.py”, line 185, in subprocess_target*
  • return self.action_handler.handle()*
  • File “/work/code/actionhandler/generate_engines.py”, line 175, in handle*
  • total_engine_build_time += self.build_engine(job)*
  • File “/work/code/actionhandler/generate_engines.py”, line 166, in build_engine*
  • builder.build_engines()*
  • File “/work/code/common/builder.py”, line 170, in build_engines*
  • self.initialize()*
  • File “/work/code/resnet50/tensorrt/ResNet50.py”, line 78, in initialize*
  • rn50_gs = RN50GraphSurgeon(self.model_path,*
  • File “/work/code/resnet50/tensorrt/rn50_graphsurgeon.py”, line 328, in init*
  • if os.path.exists(self.cache_file):*
  • File “/usr/lib/python3.8/genericpath.py”, line 19, in exists*
  • os.stat(path)*
    TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
    Traceback (most recent call last):
  • File “/usr/lib/python3.8/runpy.py”, line 194, in _run_module_as_main*
  • return _run_code(code, main_globals, None,*
  • File “/usr/lib/python3.8/runpy.py”, line 87, in _run_code*
  • exec(code, run_globals)*
  • File “/work/code/main_v2.py”, line 223, in *
  • main(main_args, DETECTED_SYSTEM)*
  • File “/work/code/main_v2.py”, line 147, in main*
  • dispatch_action(main_args, config_dict, workload_setting)*
  • File “/work/code/main_v2.py”, line 194, in dispatch_action*
  • handler.run()*
  • File “/work/code/actionhandler/base.py”, line 79, in run*
  • self.handle_failure()*
  • File “/work/code/actionhandler/base.py”, line 182, in handle_failure*
  • self.action_handler.handle_failure()*
  • File “/work/code/actionhandler/generate_engines.py”, line 183, in handle_failure*
  • raise RuntimeError(“Building engines failed!”)*
    RuntimeError: Building engines failed!
    make[1]: *** [Makefile:694: generate_engines] Error 1
    make[1]: Leaving directory ‘/work’

My SUT config
OS :ubuntu 22.04.2
GPU :A30 x2
Driver :510.39.01
Nvidia-ctk :1.13.1

nvidia-smi.txt (14.8 KB)

For this error , i find that the “python3 -m scripts.custom_systems.add_custom_system” create empty value for your sut config and you need to fill necessary content for test to run.
The github website have detail information in section " Adding a New or Custom System". Follow the section , you can avoid this error.

There are two way to create config file.

  1. Create new for your SUT
    $ python3 -m scripts.custom_systems.add_custom_system
    This will create configs file on each benchmarks and each scenario.

You need to do is copy and modify the data in init.py into the custom.py.
You can modify the server_target_qps value to match your gpu number.

  1. Modify exist configuration file
    Do not run $ python3 -m scripts.custom_systems.add_custom_system
    Just open configs/benchmarks/scenario/init.py
    Copy similar config file.
    Ex SUT is A100_PCIe_80Gx2 , you can copy A100_PCIe_80Gx8 data and change the data to match your config.