• Hardware NVIDIA TITAN Xp . Computer has Intel® Xeon(R) CPU X5680 @ 3.33GHz × 12 with 24Gb ram and is running Ubuntu 22.04.5 LTS
• Network Type Detectnet_v2
• TAO Version (Please run “tlt info --verbose” and share “docker_tag” here)
(launcher) harold@TrainingComp:~/workspace/tao-experiments/data/training/image_2$ tao info --verbose
Configuration of the TAO Toolkit Instance
task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-pyt:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. visual_changenet
4. deformable_detr
5. dino
6. grounding_dino
7. mask_grounding_dino
8. mask2former
9. mal
10. ml_recog
11. ocdnet
12. ocrnet
13. optical_inspection
14. pointpillars
15. pose_classification
16. re_identification
17. classification_pyt
18. segformer
19. bevfusion
5.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. bpnet
2. classification_tf1
3. converter
4. detectnet_v2
5. dssd
6. efficientdet_tf1
7. faster_rcnn
8. fpenet
9. lprnet
10. mask_rcnn
11. multitask_classification
12. retinanet
13. ssd
14. unet
15. yolo_v3
16. yolo_v4
17. yolo_v4_tiny
5.5.0-tf2:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
dataset:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-deploy:
docker_registry: nvcr.io
tasks:
1. visual_changenet
2. centerpose
3. classification_pyt
4. classification_tf1
5. classification_tf2
6. deformable_detr
7. detectnet_v2
8. dino
9. dssd
10. efficientdet_tf1
11. efficientdet_tf2
12. faster_rcnn
13. grounding_dino
14. mask_grounding_dino
15. mask2former
16. lprnet
17. mask_rcnn
18. ml_recog
19. multitask_classification
20. ocdnet
21. ocrnet
22. optical_inspection
23. retinanet
24. segformer
25. ssd
26. trtexec
27. unet
28. yolo_v3
29. yolo_v4
30. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024
I am trying to evaluate the .engine file from a Trafficcamnet that I trained on a small number of my own images which I have generated through following the detectnet_v2 Jupyter notebook. My desire is to compare this engine to that of the stock trafficcamnet v1.0.4. (I don’t expect better yet as I only had 514 annotated images to train with which all had the same background)
The cell I am running is:
!tao deploy detectnet_v2 evaluate
-m /workspace/tao-experiments/detectnet_v2/experiment_dir_final/trafficcamnet_detector_pruned.engine
-e /workspace/tao-experiments/detectnet_v2/specs/H4_evaluation_spec.txt
-r /workspace/tao-experiments/detectnet_v2/experiment_dir_final/results
-b 1
the latest evaluation spec file I tried using was:
dataset_config {
data_sources {
tfrecords_path: “/workspace/tao-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tao-experiments/data/training/image_2”
}
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
validation_fold: 0
}
The output of the cell was as follows:
2025-03-06 13:10:08,727 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-03-06 13:10:08,883 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy
2025-03-06 13:10:08,923 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/harold/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2025-03-06 13:10:08,923 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2025-03-06 21:10:10,813 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
Loading uff directly from the package source code
2025-03-06 21:10:12,280 [DEBUG] matplotlib: matplotlib data path: /usr/local/lib/python3.10/dist-packages/matplotlib/mpl-data
2025-03-06 21:10:12,284 [DEBUG] matplotlib: CONFIGDIR=/root/.config/matplotlib
2025-03-06 21:10:12,285 [DEBUG] matplotlib: interactive is False
2025-03-06 21:10:12,285 [DEBUG] matplotlib: platform is linux
2025-03-06 21:10:12,325 [DEBUG] matplotlib: CACHEDIR=/root/.cache/matplotlib
2025-03-06 21:10:12,326 [DEBUG] matplotlib.font_manager: Using fontManager instance from /root/.cache/matplotlib/fontlist-v390.json
2025-03-06 21:10:12,492 [INFO] nvidia_tao_deploy.cv.common.logging.status_logging: Log file already exists at /workspace/tao-experiments/detectnet_v2/experiment_dir_final/results/status.json
2025-03-06 21:10:12,493 [INFO] root: Starting detectnet_v2 evaluation.
[03/06/2025-21:10:12] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
2025-03-06 21:10:12,517 [INFO] root: empty image dir or batch size too large!
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/evaluate.py”, line 191, in
main(args)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/common/decorators.py”, line 63, in _func
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/common/decorators.py”, line 47, in _func
runner(cfg, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/scripts/evaluate.py”, line 69, in main
dl = DetectNetKITTILoader(
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/cv/detectnet_v2/dataloader.py”, line 33, in init
super().init(**kwargs)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/dataloader/kitti.py”, line 93, in init
assert self.n_batches > 0, “empty image dir or batch size too large!”
AssertionError: empty image dir or batch size too large!
[2025-03-06 21:10:12,858 - TAO Toolkit - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto - INFO] Sending telemetry data.
[2025-03-06 21:10:12,859 - TAO Toolkit - root - INFO] ================> Start Reporting Telemetry <================
[2025-03-06 21:10:12,859 - TAO Toolkit - root - INFO] Sending {‘version’: ‘5.5.0’, ‘action’: ‘evaluate’, ‘network’: ‘detectnet_v2’, ‘gpu’: [‘NVIDIA-TITAN-Xp’], ‘success’: False, ‘time_lapsed’: 1.514575719833374} to https://api.tao.ngc.nvidia.com.
[2025-03-06 21:10:13,345 - TAO Toolkit - root - INFO] Telemetry sent successfully.
[2025-03-06 21:10:13,346 - TAO Toolkit - root - INFO] ================> End Reporting Telemetry <================
[2025-03-06 21:10:13,346 - TAO Toolkit - nvidia_tao_deploy.cv.common.entrypoint.entrypoint_proto - INFO] Execution status: FAIL
2025-03-06 13:10:13,650 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
I’ve tried a bunch of different things including trying to use my retrain spec file as well as the sample file from the NVIDIA docs hub, but I haven’t been able to get anything to work.
Please let me know if you have any advice.