Tlt3.0 retrain trafficcamnet getting the error when do the evaluation

when i retrained the trafficcamnet model on my own custom datasets, ofcourse, i converted them into tfrecord before throwing them in the tlt toolkit engine, everything went well during the training stage, the training log epoch and steps displayed , however, the error occurred in the stage of validation _evaluation.

2021-04-28 06:51:33,395 [INFO] tensorflow: global_step/sec: 1.73384
INFO:tensorflow:epoch = 0.9116279069767441, loss = 0.04012449, step = 196 (5.852 sec)
2021-04-28 06:51:37,409 [INFO] tensorflow: epoch = 0.9116279069767441, loss = 0.04012449, step = 196 (5.852 sec)
2021-04-28 06:51:39,116 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 69.516
INFO:tensorflow:epoch = 0.958139534883721, loss = 0.039332416, step = 206 (5.711 sec)
2021-04-28 06:51:43,120 [INFO] tensorflow: epoch = 0.958139534883721, loss = 0.039332416, step = 206 (5.711 sec)
INFO:tensorflow:global_step/sec: 1.7304
2021-04-28 06:51:45,531 [INFO] tensorflow: global_step/sec: 1.7304
89f5a48d3120:63:107 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
89f5a48d3120:63:107 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
89f5a48d3120:63:107 [0] NCCL INFO NET/IB : No device found.
89f5a48d3120:63:107 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
89f5a48d3120:63:107 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1

89f5a48d3120:63:107 [0] NCCL INFO Channel 00/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 01/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 02/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 03/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 04/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 05/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 06/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 07/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 08/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 09/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 10/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 11/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 12/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 13/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 14/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 15/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 16/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 17/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 18/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 19/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 20/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 21/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 22/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 23/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 24/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 25/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 26/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 27/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 28/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 29/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 30/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Channel 31/32 : 0
89f5a48d3120:63:107 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/
89f5a48d3120:63:107 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
89f5a48d3120:63:107 [0] NCCL INFO comm 0x7fac9438de40 rank 0 nranks 1 cudaDev 0 busId 1000 - Init COMPLETE
2021-04-28 06:51:48,063 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 37, 0.00s/step
Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 797, in
File “”, line 2, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 790, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 691, in run_experiment
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 624, in train_gridbox
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 149, in run_training_loop
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1426, in run
run_metadata=run_metadata))
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py”, line 79, in after_run
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py”, line 85, in validate
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/evaluation/evaluation.py”, line 165, in evaluate
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/postprocessing.py”, line 146, in cluster_predictions
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/cluster.py”, line 45, in cluster_predictions
AssertionError
Traceback (most recent call last):
File “/usr/local/bin/detectnet_v2”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
2021-04-28 14:52:04,113 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

addtionally, i am not sure whether it results from the datasets issue, as i searched the forums from the similar problem described, some one said that it is because the tfrecord converted failure, however, when i checked the log as follows:
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-04-28 06:41:10,769 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2021-04-28 06:41:10,788 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 8595 Val: 1516
2021-04-28 06:41:10,788 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2021-04-28 06:41:10,793 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:142: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2021-04-28 06:41:10,794 - tensorflow - WARNING - From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:142: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2021-04-28 06:41:10,943 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2021-04-28 06:41:11,090 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2021-04-28 06:41:11,220 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 3
2021-04-28 06:41:11,350 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 4
2021-04-28 06:41:11,476 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 5
2021-04-28 06:41:11,604 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 6
2021-04-28 06:41:11,734 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 7
2021-04-28 06:41:11,861 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 8
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06873.txt”
2021-04-28 06:41:11,989 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 9
2021-04-28 06:41:12,124 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
b’pedestrian’: 1926
b’road_sign’: 1564
b’car’: 3946
b’cyclist’: 139

2021-04-28 06:41:12,124 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07379.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06858.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06856.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06872.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06855.txt”
2021-04-28 06:41:12,869 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 1
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06571.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06875.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07403.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/00653.txt”
2021-04-28 06:41:13,620 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 2
2021-04-28 06:41:14,371 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 3
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06860.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06861.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07401.txt”
2021-04-28 06:41:15,125 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 4
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/00654.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07279.txt”
2021-04-28 06:41:15,880 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 5
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07400.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/03094.txt”
2021-04-28 06:41:16,632 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 6
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06874.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06877.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07405.txt”
2021-04-28 06:41:17,389 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 7
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07402.txt”

/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/07278.txt”
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/00652.txt”
2021-04-28 06:41:18,132 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 8
2021-04-28 06:41:18,878 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:273: UserWarning: genfromtxt: Empty input file: “/workspace/tlt-experiments/data/training/label_2/06859.txt”
2021-04-28 06:41:19,628 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
b’car’: 22665
b’pedestrian’: 11074
b’road_sign’: 8679
b’cyclist’: 760

2021-04-28 06:41:19,628 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics
2021-04-28 06:41:19,628 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO -
Wrote the following numbers of objects:
b’pedestrian’: 13000
b’road_sign’: 10243
b’car’: 26611
b’cyclist’: 899

2021-04-28 06:41:19,628 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map.
Label in GT: Label in tfrecords file
b’Pedestrian’: b’pedestrian’
b’road_sign’: b’road_sign’
b’Car’: b’car’
b’Cyclist’: b’cyclist’
For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.

2021-04-28 06:41:19,628 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete.
2021-04-28 14:41:20,275 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

it displayed the generation complete, i assumed it should be successful even thought there are some empty label *txt as mentioned in the log, that confused me…

i am very urgent about this issue, i will appreciate you so much if you could give me some suggestion… thanks

here is my training config:
random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “Car”
value: “car”
}
target_class_mapping {
key: “Cyclist”
value: “cyclist”
}
target_class_mapping {
key: “Pedestrian”
value: “pedestrian”
}
target_class_mapping {
key: “person_sitting”
value: “pedestrian”
}
target_class_mapping {
key: “van”
value: “car”
}
target_class_mapping {
key: “road_sign”
value: “road_sign”
}

validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “Car”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “Cyclist”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “Pedestrian”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tlt-experiments/detectnet_v2/pretrained_trafficcamnet_Ucit/tlt_trafficcamnet_unpruned_v1.0/resnet18_trafficcamnet.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
all_projections:true
}
evaluation_config {
validation_period_during_training: 3
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “Car”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “Cyclist”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “Pedestrian”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “road_sign”
value: 0.5
}
evaluation_box_config {
key: “Car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “Cyclist”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “Pedestrian”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “road_sign”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “Car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “Cyclist”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “Pedestrian”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “road_sign”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 40
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 3
}
bbox_rasterizer_config {
target_class_config {
key: “Car”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “Cyclist”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “Pedestrian”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “road_sign”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

i still have a question about the input resolution rate of pics, anyway, i noticed that some one have mentioned that the datasets’ resolution should be same with the config you set output_image_height and width,
but i am sure it is okay when i trained resnet18 via using different datasets ,and configed with different size, it trained perfect and evaluated well, i am not sure, maybe this is not an issue in version3.0???

@Morganh plz help me to check, thanks in advance

Please modify all of your training config file’s class name to lowercase.

For example,

target_class_mapping {
key: “Pedestrian”
value: “pedestrian”
}

to

target_class_mapping {
key: “pedestrian”
value: “pedestrian”
}

i will try it soon, by the way, the class i wrote in the *txt is upper case!

like Car … Pedestrian… is that okay? thanks for your replying

That should be ok. The tfrecords files have lowercase only.

BTW, please note that the train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/open_model_architectures.html#detectnet-v2

Sorry, Morganh, the issue is still there after i changed all classes’ uppercase to lowercass in config file, the points are the error popped out during the evaluation step, not the training process, is it a little weired?

and i noticed the log mentioned in the tfrecord converting procedure, the log said there are empty *txt during the converting, does it result in the problem? anyway, i am trying to filter all empty *txt files and remove corresponding images at the same time, if it works, i will let you know

  • Init COMPLETE
    2021-04-28 07:53:37,043 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 37, 0.00s/step

Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 797, in
File “”, line 2, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 790, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 691, in run_experiment
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 624, in train_gridbox
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 149, in run_training_loop
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1426, in run
run_metadata=run_metadata))
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py”, line 79, in after_run
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py”, line 85, in validate
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/evaluation/evaluation.py”, line 165, in evaluate
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/postprocessing.py”, line 146, in cluster_predictions
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/cluster.py”, line 45, in cluster_predictions
AssertionError
Traceback (most recent call last):
File “/usr/local/bin/detectnet_v2”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
2021-04-28 15:53:52,762 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Can you share your latest training spec file?

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
target_class_mapping {
key: “cyclist”
value: “cyclist”
}
target_class_mapping {
key: “pedestrian”
value: “pedestrian”
}
target_class_mapping {
key: “person_sitting”
value: “pedestrian”
}
target_class_mapping {
key: “van”
value: “car”
}
target_class_mapping {
key: “road_sign”
value: “road_sign”
}

validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “car”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “cyclist”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “pedestrian”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tlt-experiments/detectnet_v2/pretrained_trafficcamnet_Ucit/tlt_trafficcamnet_unpruned_v1.0/resnet18_trafficcamnet.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
all_projections:true
}
evaluation_config {
validation_period_during_training: 3
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “car”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “cyclist”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “pedestrian”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “road_sign”
value: 0.5
}
evaluation_box_config {
key: “car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “cyclist”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “pedestrian”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “road_sign”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “cyclist”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “pedestrian”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “road_sign”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 40
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 3
}
bbox_rasterizer_config {
target_class_config {
key: “car”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “cyclist”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “pedestrian”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “road_sign”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

You are missing “road_sign” in the postprocessing_config section.
Please add it.

thanks!!!
i passed the evaluation step in the first validation procedure, that is due to i miss the road_sign config in the postprocessing step, i am so incaution! it works, give you a hug

@Morganh , By the way, could you plz tell me why i got the evaluation results as follows after the first epoch finished ? the average precision displayed all 0 or nan, does that represent the real model detection performance?? i am weired that i have ulitlized the pretrained model, namely trafficcamnet, it shouldn’t perform like this?

cedda2304988:64:108 [0] NCCL INFO Channel 00/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 01/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 02/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 03/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 04/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 05/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 06/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 07/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 08/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 09/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 10/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 11/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 12/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 13/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 14/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 15/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 16/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 17/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 18/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 19/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 20/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 21/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 22/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 23/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 24/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 25/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 26/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 27/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 28/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 29/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 30/32 : 0
cedda2304988:64:108 [0] NCCL INFO Channel 31/32 : 0
cedda2304988:64:108 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/
cedda2304988:64:108 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
cedda2304988:64:108 [0] NCCL INFO comm 0x7f072838fce0 rank 0 nranks 1 cudaDev 0 busId 1000 - Init COMPLETE
2021-04-28 08:23:59,402 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 37, 0.00s/step
2021-04-28 08:26:24,874 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 37, 14.55s/step
2021-04-28 08:28:44,995 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 37, 14.01s/step
2021-04-28 08:31:05,170 [INFO] iva.detectnet_v2.evaluation.evaluation: step 30 / 37, 14.02s/step
Matching predictions to ground truth, class 1/4.: 100%|█| 26390/26390 [00:00<00:00, 103937.46it/s]
Matching predictions to ground truth, class 2/4.: 100%|█| 15333/15333 [00:00<00:00, 121314.08it/s]
Matching predictions to ground truth, class 3/4.: 100%|█| 180036/180036 [00:01<00:00, 121380.37it/s]
Matching predictions to ground truth, class 4/4.: 100%|█| 607242/607242 [00:23<00:00, 25328.65it/s]
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/evaluation/compute_metrics.py:721: RuntimeWarning: invalid value encountered in true_divide
Epoch 1/120

Validation cost: 0.181464
Mean average_precision (in %): nan

class name average precision (in %)


car 0
cyclist nan
pedestrian 0
road_sign 0

Median Inference Time: 0.007478
INFO:tensorflow:epoch = 1.0, loss = 0.18259971, step = 215 (563.058 sec)
2021-04-28 08:33:18,740 [INFO] tensorflow: epoch = 1.0, loss = 0.18259971, step = 215 (563.058 sec)
2021-04-28 08:33:18,741 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 1/120: loss: 0.18260 Time taken: 0:11:34.042830 ETA: 22:56:31.096797
2021-04-28 08:33:24,004 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 1.744
INFO:tensorflow:epoch = 1.0465116279069768, loss = 0.16593793, step = 225 (5.883 sec)
2021-04-28 08:33:24,624 [INFO] tensorflow: epoch = 1.0465116279069768, loss = 0.16593793, step = 225 (5.883 sec)
INFO:tensorflow:global_step/sec: 0.0367631
2021-04-28 08:33:28,070 [INFO] tensorflow: global_step/sec: 0.0367631
INFO:tensorflow:epoch = 1.0930232558139534, loss = 0.15178509, step = 235 (5.781 sec)
2021-04-28 08:33:30,405 [INFO] tensorflow: epoch = 1.0930232558139534, loss = 0.15178509, step = 235 (5.781 sec)
INFO:tensorflow:epoch = 1.1395348837209303, loss = 0.14174712, step = 245 (5.841 sec)
2021-04-28 08:33:36,246 [INFO] tensorflow: epoch = 1.1395348837209303, loss = 0.14174712, step = 245 (5.841 sec)
2021-04-28 08:33:38,651 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 68.273
INFO:tensorflow:global_step/sec: 1.68416
2021-04-28 08:33:40,539 [INFO] tensorflow: global_step/sec: 1.68416

Do you mean you want to directly check the evaluation result against the trafficcamnet pretrained model in ngc?

as far as i am concerned, the model i trained should be peformed better or not poorer at least by comparison with pretrained model merely trained in one epoch , of course, the prerequisite is that the datasets are high qualified, however , it looks the model was trained from scratch, the precision are low as zero, i am not sure the reason why.

Of course, that will be better if there is a way to directly comparise the evaluation of the model i trained with pretrained trafficcamnet.

i also check the introduction of trafficCamnet referred in here https://ngc.nvidia.com/catalog/models/nvidia:tlt_trafficcamnet

it mentioned

Description

4 class object detection network to detect cars in an image,

 i downloaded the labels.txt corresponding with the pruned model , it showed the four classes, namely,  car  bicycle person road_sign in this order,,  i am wondering that i probably labelled the wrong class name!!!

If you want to use the ngc pretrained model to run evaluation against your dataset, please refer to

Two major changing in your spec:

load_graph: True

min_learning_rate: 10e-10
max_learning_rate: 10e-10

It will directly load the pretained model and run evaluation.

Hi, i 've evaluated the pretrained model of trafficCamnet via loading my customed datasets, however i got the results as follows:

INFO:tensorflow:Graph was finalized.
2021-04-29 09:24:43,685 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2021-04-29 09:24:44,356 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-04-29 09:24:44,590 [INFO] tensorflow: Done running local_init_op.
2021-04-29 09:24:45,087 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 2, 0.00s/step
Matching predictions to ground truth, class 1/4.: 100%|█| 363/363 [00:00<00:00, 110352.42it/s]
Matching predictions to ground truth, class 3/4.: 100%|█| 6/6 [00:00<00:00, 31261.89it/s]
Matching predictions to ground truth, class 4/4.: 100%|█| 1/1 [00:00<00:00, 5059.47it/s]
/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/evaluation/compute_metrics.py:721: RuntimeWarning: invalid value encountered in true_divide
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2021-04-29 09:24:53,220 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:98: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

2021-04-29 09:24:53,220 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:98: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

Validation cost: 0.000168
Mean average_precision (in %): nan

class name average precision (in %)


bicycle 0
car nan
person nan
road_sign 0

Median Inference Time: 0.099959
2021-04-29 09:24:53,259 [INFO] main: Evaluation complete.
Time taken to run main:main: 0:00:12.919814.
2021-04-29 17:24:55,479 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

all NAN, but i am sure, the pretrained model is able to inference my datasets, and generated the annotated images as well. Did i miss something in my SPEC??

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
target_class_mapping {
key: “bicycle”
value: “bicycle”
}
target_class_mapping {
key: “person”
value: “person”
}
target_class_mapping {
key: “road_sign”
value: “road_sign”
}

validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 0.8
zoom_max: 2.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “car”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “bicycle”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “person”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “road_sign”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tlt-experiments/detectnet_v2/pretrained_trafficcamnet_Ucit/tlt_trafficcamnet_unpruned_v1.0/resnet18_trafficcamnet.tlt”
num_layers: 18
load_graph: true
use_batch_norm: false

activation {
activation_type: “relu”
}
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 1
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “car”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “bicycle”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “road_sign”
value: 0.5
}
evaluation_box_config {
key: “car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “bicycle”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “road_sign”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “bicycle”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “person”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “road_sign”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 40
num_epochs: 12
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 10e-10
max_learning_rate: 10e-10
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 3
}
bbox_rasterizer_config {
target_class_config {
key: “car”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bicycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “road_sign”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

That’s because the class_name from trafficcamnet does not meet the class_name in your dataset.

As you mentioned, the pretrained model is able to inference my datasets, and generated the annotated images as well, so, please check the 4 class names in the output labels files.
Then your dataset and training spec must contain the exact class names from trafficcamnet.