Please provide the following information when requesting support.
• Hardware (T4)
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
Configuration of the TAO Toolkit Instance
task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
5.0.0-tf2.11.0:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
5.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. bpnet
2. classification_tf1
3. converter
4. detectnet_v2
5. dssd
6. efficientdet_tf1
7. faster_rcnn
8. fpenet
9. lprnet
10. mask_rcnn
11. multitask_classification
12. retinanet
13. ssd
14. unet
15. yolo_v3
16. yolo_v4
17. yolo_v4_tiny
5.2.0-pyt2.1.0:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. deformable_detr
4. dino
5. mal
6. ml_recog
7. ocdnet
8. ocrnet
9. optical_inspection
10. pointpillars
11. pose_classification
12. re_identification
13. visual_changenet
5.2.0-pyt1.14.0:
docker_registry: nvcr.io
tasks:
1. classification_pyt
2. segformer
dataset:
dockers:
nvidia/tao/tao-toolkit:
5.2.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
5.2.0-deploy:
docker_registry: nvcr.io
tasks:
1. visual_changenet
2. centerpose
3. classification_pyt
4. classification_tf1
5. classification_tf2
6. deformable_detr
7. detectnet_v2
8. dino
9. dssd
10. efficientdet_tf1
11. efficientdet_tf2
12. faster_rcnn
13. lprnet
14. mask_rcnn
15. ml_recog
16. multitask_classification
17. ocdnet
18. ocrnet
19. optical_inspection
20. retinanet
21. segformer
22. ssd
23. trtexec
24. unet
25. yolo_v3
26. yolo_v4
27. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.2.0
published_date: 12/06/2023
• Training spec file(If have, please share here)
detectnet_v2_train_resnet18_kitti.txt (3.3 KB)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
!tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-n resnet18_detector
–gpus $NUM_GPUS
–use_amp
2024-01-15 12:31:42,838 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-01-15 12:31:42,913 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-01-15 12:31:42,927 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-01-15 04:31:43.605353: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-01-15 04:31:43,657 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2024-01-15 04:31:45,345 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:45,387 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:45,391 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:47,259 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-6fagkams because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-01-15 04:31:47,569 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:50,010 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:50,050 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:50,054 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2024-01-15 04:31:51,888 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.logging.logging 197: Log file already exists at /workspace/tao-experiments/detectnet_v2/experiment_dir_unpruned/status.json
2024-01-15 04:31:51,888 [TAO Toolkit] [INFO] root 2102: Starting DetectNet_v2 Training job
2024-01-15 04:31:51,888 [TAO Toolkit] [INFO] main 817: Loading experiment spec at /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.
2024-01-15 04:31:51,889 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.spec_handler.spec_loader 113: Merging specification from /workspace/tao-experiments/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
2024-01-15 04:31:51,890 [TAO Toolkit] [INFO] root 2102: 46:29 : ’ dbscan_min_samples: 0.0500000007451’: Couldn’t parse integer: 0.0500000007451
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1702, in _ParseAbstractInteger
return int(text, 0)
ValueError: invalid literal for int() with base 0: ‘0.0500000007451’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1652, in _ConsumeInteger
result = ParseInteger(tokenizer.token, is_signed=is_signed, is_long=is_long)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1674, in ParseInteger
result = _ParseAbstractInteger(text)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1704, in _ParseAbstractInteger
raise ValueError(‘Couldn't parse integer: %s’ % orig_text)
ValueError: Couldn’t parse integer: 0.0500000007451
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py”, line 1067, in
raise e
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py”, line 1046, in
main()
File “/usr/local/lib/python3.8/dist-packages/decorator.py”, line 232, in fun
return caller(func, *(extras + args), **kw)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
return_args = fn(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py”, line 1024, in main
run_experiment(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/scripts/train.py”, line 821, in run_experiment
experiment_spec = load_experiment_spec(
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/spec_handler/spec_loader.py”, line 136, in load_experiment_spec
experiment_spec = load_proto(spec_path, experiment_spec, default_spec_path,
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/spec_handler/spec_loader.py”, line 114, in load_proto
_load_from_file(spec_path, proto_buffer)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/spec_handler/spec_loader.py”, line 100, in _load_from_file
merge_text_proto(f.read(), pb2)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 719, in Merge
return MergeLines(
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 793, in MergeLines
return parser.MergeLines(lines, message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 818, in MergeLines
self._ParseOrMerge(lines, message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 837, in _ParseOrMerge
self._MergeField(tokenizer, message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 967, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1042, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 967, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1042, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 967, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1042, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 967, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1042, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 967, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1076, in _MergeScalarField
value = _ConsumeInt32(tokenizer)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1573, in _ConsumeInt32
return _ConsumeInteger(tokenizer, is_signed=True, is_long=False)
File “/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py”, line 1654, in _ConsumeInteger
raise tokenizer.ParseError(str(e))
google.protobuf.text_format.ParseError: 46:29 : ’ dbscan_min_samples: 0.0500000007451’: Couldn’t parse integer: 0.0500000007451