Optical Inspection Deploy

I am trying to run the Optical Inspection program. I don’t know how to set the CSV of the validation set. If I only write one piece of data in the CSV of the validation set, the program can run. But if I write two pieces of data, I will report the following error.

2023-09-14 16:44:16,763 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-09-14 16:44:16,801 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy
2023-09-14 16:44:16,817 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 262: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/lab/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-09-14 16:44:16,817 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
2023-09-14 08:44:17,776 [TAO Toolkit] [INFO] matplotlib.font_manager 1544: generated new fontManager
python /usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/optical_inspection/scripts/inference.py  --config-path /specs --config-name experiment.yaml results_dir=/results inference.trt_engine=/results/export/oi_model.engine
sys:1: UserWarning: 
'experiment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen cv.common.hydra.hydra_runner>:99: UserWarning: 
'experiment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Log file already exists at /results/status.json
Starting optical_inspection inference.
Running inference
Instantiate the optical inspection inferencer.
Loading engine from /results/export/oi_model.engine
Instantiating the optical inspection dataloader.
Using 4 input and linear type 1 X 4 for comparison.
Number of sample batches: 1
Running inference
  0%|                                                     | 0/1 [00:00<?, ?it/s]
image_paths list bigger (2) thanengine max batch size (1)
Error executing job with overrides: ['results_dir=/results', 'inference.trt_engine=/results/export/oi_model.engine']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 254, in run_and_report
    assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/optical_inspection/scripts/inference.py>", line 3, in <module>
  File "<frozen cv.optical_inspection.scripts.inference>", line 95, in <module>
  File "<frozen cv.common.hydra.hydra_runner>", line 99, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 296, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "<frozen cv.common.decorators>", line 63, in _func
  File "<frozen cv.common.decorators>", line 48, in _func
  File "<frozen cv.optical_inspection.scripts.inference>", line 82, in main
  File "<frozen cv.optical_inspection.inferencer>", line 111, in infer
ValueError: image_paths list bigger (2) thanengine max batch size (1)
Sending telemetry data.
Execution status: FAIL
2023-09-14 16:44:39,682 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

This is the validation set I used:
Screenshot from 2023-09-14 17-01-00

May I ask how to set the CSV of the validation set if I need to input multiple data?

Can you share the yaml file?

This is the yaml file I used.
experiment.yaml (1.9 KB)

How about running tao model optical_inspection inference ?

The info comes from https://github.com/NVIDIA/tao_deploy/blob/28d471b5cabc226ab131793637f869287fca9393/nvidia_tao_deploy/cv/optical_inspection/inferencer.py#L104. Please enlarge max_batch_size when you generate the tensorrt engine.

See SiameseOI with TAO Deploy - NVIDIA Docs
max_batch_size: int = 2

I am using the officially provided onnx model file deployable_v1.0. After I added this setting max_batch_size: int = 2 and regenerated the model file, the following error occurred after running it.

2023-09-22 15:46:25,742 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2023-09-22 15:46:25,779 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy
2023-09-22 15:46:25,795 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 262: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/lab/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2023-09-22 15:46:25,795 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 275: Printing tty value True
2023-09-22 07:46:26,705 [TAO Toolkit] [INFO] matplotlib.font_manager 1544: generated new fontManager
python /usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/optical_inspection/scripts/inference.py  --config-path /specs --config-name experiment.yaml results_dir=/results inference.trt_engine=/results/export/oi_model.engine
sys:1: UserWarning: 
'experiment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen cv.common.hydra.hydra_runner>:99: UserWarning: 
'experiment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Log file already exists at /results/status.json
Starting optical_inspection inference.
Running inference
Instantiate the optical inspection inferencer.
Loading engine from /results/export/oi_model.engine
Exception ignored in: <function OpticalInspectionInferencer.__del__ at 0x7fb3f20674c0>
Traceback (most recent call last):
  File "<frozen cv.optical_inspection.inferencer>", line 143, in __del__
AttributeError: 'OpticalInspectionInferencer' object has no attribute 'stream'
cuMemHostAlloc failed: out of memory
Error executing job with overrides: ['results_dir=/results', 'inference.trt_engine=/results/export/oi_model.engine']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 254, in run_and_report
    assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/optical_inspection/scripts/inference.py>", line 3, in <module>
  File "<frozen cv.optical_inspection.scripts.inference>", line 95, in <module>
  File "<frozen cv.common.hydra.hydra_runner>", line 99, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 296, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "<frozen cv.common.decorators>", line 63, in _func
  File "<frozen cv.common.decorators>", line 48, in _func
  File "<frozen cv.optical_inspection.scripts.inference>", line 69, in main
  File "<frozen cv.optical_inspection.inferencer>", line 84, in __init__
  File "<frozen inferencer.utils>", line 141, in allocate_buffers
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory
Sending telemetry data.
Execution status: FAIL
2023-09-22 15:46:42,759 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 337: Stopping container.

Seems that it is out-of-memory. Please check if the gpu memory. Please check nvidia-smi. Also, you can try to set lower max_batch_size and input width/height.