Trouble with tao deploy detectnet_v2 evaluate on engine file -AssertionError: empty image dir or batch size too large!

user149500 · March 6, 2025, 10:28pm

• Hardware NVIDIA TITAN Xp . Computer has Intel® Xeon(R) CPU X5680 @ 3.33GHz × 12 with 24Gb ram and is running Ubuntu 22.04.5 LTS
• Network Type Detectnet_v2
• TAO Version (Please run “tlt info --verbose” and share “docker_tag” here)
(launcher) harold@TrainingComp:~/workspace/tao-experiments/data/training/image_2$ tao info --verbose
Configuration of the TAO Toolkit Instance

task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-pyt:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. visual_changenet
4. deformable_detr
5. dino
6. grounding_dino
7. mask_grounding_dino
8. mask2former
9. mal
10. ml_recog
11. ocdnet
12. ocrnet
13. optical_inspection
14. pointpillars
15. pose_classification
16. re_identification
17. classification_pyt
18. segformer
19. bevfusion
5.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. bpnet
2. classification_tf1
3. converter
4. detectnet_v2
5. dssd
6. efficientdet_tf1
7. faster_rcnn
8. fpenet
9. lprnet
10. mask_rcnn
11. multitask_classification
12. retinanet
13. ssd
14. unet
15. yolo_v3
16. yolo_v4
17. yolo_v4_tiny
5.5.0-tf2:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
dataset:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-deploy:
docker_registry: nvcr.io
tasks:
1. visual_changenet
2. centerpose
3. classification_pyt
4. classification_tf1
5. classification_tf2
6. deformable_detr
7. detectnet_v2
8. dino
9. dssd
10. efficientdet_tf1
11. efficientdet_tf2
12. faster_rcnn
13. grounding_dino
14. mask_grounding_dino
15. mask2former
16. lprnet
17. mask_rcnn
18. ml_recog
19. multitask_classification
20. ocdnet
21. ocrnet
22. optical_inspection
23. retinanet
24. segformer
25. ssd
26. trtexec
27. unet
28. yolo_v3
29. yolo_v4
30. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024

I am trying to evaluate the .engine file from a Trafficcamnet that I trained on a small number of my own images which I have generated through following the detectnet_v2 Jupyter notebook. My desire is to compare this engine to that of the stock trafficcamnet v1.0.4. (I don’t expect better yet as I only had 514 annotated images to train with which all had the same background)

The cell I am running is:
!tao deploy detectnet_v2 evaluate
-m /workspace/tao-experiments/detectnet_v2/experiment_dir_final/trafficcamnet_detector_pruned.engine
-e /workspace/tao-experiments/detectnet_v2/specs/H4_evaluation_spec.txt
-r /workspace/tao-experiments/detectnet_v2/experiment_dir_final/results
-b 1

the latest evaluation spec file I tried using was:
dataset_config {
data_sources {
tfrecords_path: “/workspace/tao-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tao-experiments/data/training/image_2”
}
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
validation_fold: 0
}

The output of the cell was as follows:
2025-03-06 13:10:08,727 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-03-06 13:10:08,883 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy
2025-03-06 13:10:08,923 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/harold/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2025-03-06 13:10:08,923 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2025-03-06 21:10:10,813 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
Loading uff directly from the package source code
2025-03-06 21:10:12,280 [DEBUG] matplotlib: matplotlib data path: /usr/local/lib/python3.10/dist-packages/matplotlib/mpl-data
2025-03-06 21:10:12,284 [DEBUG] matplotlib: CONFIGDIR=/root/.config/matplotlib
2025-03-06 21:10:12,285 [DEBUG] matplotlib: interactive is False
2025-03-06 21:10:12,285 [DEBUG] matplotlib: platform is linux
2025-03-06 21:10:12,325 [DEBUG] matplotlib: CACHEDIR=/root/.cache/matplotlib
2025-03-06 21:10:12,326 [DEBUG] matplotlib.font_manager: Using fontManager instance from /root/.cache/matplotlib/fontlist-v390.json
2025-03-06 21:10:12,492 [INFO] nvidia_tao_deploy.cv.common.logging.status_logging: Log file already exists at /workspace/tao-experiments/detectnet_v2/experiment_dir_final/results/status.json
2025-03-06 21:10:12,493 [INFO] root: Starting detectnet_v2 evaluation.
[03/06/2025-21:10:12] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2025-03-06 21:10:12,517 [INFO] root: empty image dir or batch size too large!
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/evaluate.py”, line 191, in
main(args)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/common/decorators.py”, line 63, in _func
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/common/decorators.py”, line 47, in _func
runner(cfg, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/evaluate.py”, line 69, in main
dl = DetectNetKITTILoader(
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/dataloader.py”, line 33, in init
super().init(**kwargs)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/dataloader/kitti.py”, line 93, in init
assert self.n_batches > 0, “empty image dir or batch size too large!”
AssertionError: empty image dir or batch size too large!
[2025-03-06 21:10:12,858 - TAO Toolkit - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto - INFO] Sending telemetry data.
[2025-03-06 21:10:12,859 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2025-03-06 21:10:12,859 - TAO Toolkit - root - INFO] Sending {‘version’: ‘5.5.0’, ‘action’: ‘evaluate’, ‘network’: ‘detectnet_v2’, ‘gpu’: [‘NVIDIA-TITAN-Xp’], ‘success’: False, ‘time_lapsed’: 1.514575719833374} to https://api.tao.ngc.nvidia.com.
[2025-03-06 21:10:13,345 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2025-03-06 21:10:13,346 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2025-03-06 21:10:13,346 - TAO Toolkit - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto - INFO] Execution status: FAIL
2025-03-06 13:10:13,650 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

I’ve tried a bunch of different things including trying to use my retrain spec file as well as the sample file from the NVIDIA docs hub, but I haven’t been able to get anything to work.
Please let me know if you have any advice.

Morganh · March 7, 2025, 6:48am

The source code is in tao_deploy/nvidia_tao_deploy/dataloader/kitti.py at main · NVIDIA/tao_deploy · GitHub.

Please refer to below step in the notebook

I am afraid you can add “-i xxx” part.
.

user149500 · March 7, 2025, 7:35am

Thank you @Morganh.
Do I understand that you can use -i XXX with evaluate as well as inference?
If I use -i, do I still include image_directory_path in the evaluation spec file and would I also use -l XXX for the labels?

There is the sample evaluation spec file at:

what changes would you recommend I make to it?

Thank you for your help!

Morganh · March 7, 2025, 7:41am

Yes.

Please refer to the notebook.

user149500 · March 11, 2025, 5:38pm

Thank you @Morganh ! That worked. I copied the format from the notebook for the qat and used it with the regular non-qat spec files:
!tao deploy detectnet_v2 evaluate -e $SPECS_DIR/detectnet_v2_retrain_trafficcamnet_kitti.txt
-m $USER_EXPERIMENT_DIR/experiment_dir_final/trafficcamnet_detector_pruned.engine
-i $DATA_DOWNLOAD_DIR/training/image_2
-l $DATA_DOWNLOAD_DIR/training/label_2
-r $USER_EXPERIMENT_DIR/experiment_dir_final_Mar7/

user149500 · March 11, 2025, 5:42pm

I do have a follow on question that I haven’t figured out from the notebook.
How do I take the pre-trained v1.0.4 trafficcamnet onnx file and convert it into an engine to compare against the engine I have just trained with my images?
I would like to make a fair and direct comparison between them that uses the same images (ones that I would annotate from our data) But if I was to use tao deploy detectnet_v2 evaluate, I don’t know what to use for the spec file for the stock trafficcamnet.

Morganh · March 12, 2025, 2:42am

You can run trtexec to convert onnx file to tensorrt engine.
Refer to TRTEXEC with DetectNet-v2 - NVIDIA Docs.

For FP32, you can run

trtexec --onnx=/path/to/model.onnx \
        --maxShapes="input_1:0":16x3x544x960 \
        --minShapes="input_1:0":1x3x544x960 \
        --optShapes="input_1:0":8x3x544x960 \
        --saveEngine=/path/to/save/trt/fp32.engine

Topic		Replies	Views
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	51	October 30, 2024
Error when using tao tool to train detectnet_v2 detection model TAO Toolkit	33	1219	February 5, 2022
docker.errors.ImageNotFound after follow "nvidia/tao/cv_samples:v1.4.1" TAO Toolkit	12	454	November 13, 2022
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	1690	July 10, 2023
Preprocessing crop parameters lead to null output dim(s) TAO Toolkit	6	422	June 10, 2022
Mask-RCNN int8 Version Results in Poor Performance TAO Toolkit	37	1005	July 6, 2022
Detectnet_v2 training core dumped error TAO Toolkit tensorrt , tensorflow , deep-learning , tao	24	1081	June 21, 2022
Tao model error TAO Toolkit	9	110	October 21, 2024
Tao toolkit detectnet training kitty format error TAO Toolkit	10	414	December 8, 2023
Grounding dino : out of memory TAO Toolkit	6	56	January 22, 2025

Trouble with tao deploy detectnet_v2 evaluate on engine file -AssertionError: empty image dir or batch size too large!

Related topics