Mean average precision of 0.00 in training Trafficcamnet model using Tao Toolkit

Please provide the following information when requesting support.

• Hardware (NVIDIA A10)
• Network Type ( DetectNet_v2 detector with ResNet18)
• TLT Version (nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3)
• Training spec file(
spec.txt (6.4 KB)
)

Input images are in jpg format with different resolution (1080720, 19201080, etc)

2025-01-09 13:25:11,940 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1110 / 1201, 1.43s/step
2025-01-09 13:25:26,209 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1120 / 1201, 1.43s/step
2025-01-09 13:25:41,289 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1130 / 1201, 1.51s/step
2025-01-09 13:25:55,287 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1140 / 1201, 1.40s/step
2025-01-09 13:26:09,981 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1150 / 1201, 1.47s/step
2025-01-09 13:26:24,005 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1160 / 1201, 1.40s/step
2025-01-09 13:26:38,006 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1170 / 1201, 1.40s/step
2025-01-09 13:26:52,730 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1180 / 1201, 1.47s/step
2025-01-09 13:27:06,819 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1190 / 1201, 1.41s/step
2025-01-09 13:27:21,501 [INFO] iva.detectnet_v2.evaluation.evaluation: step 1200 / 1201, 1.47s/step
Matching predictions to ground truth, class 1/4.: 100%|████████████████████████████████████| 4/4 [00:00<00:00, 28679.00it/s]
Epoch 1/20
=========================

Validation cost: 0.005425
Mean average_precision (in %): 0.0000

class name       average precision (in %)
-------------  --------------------------
four_wheeler                            0
heavy                                   0
three_wheeler                           0
two_wheeler                             0

Median Inference Time: 0.008612
INFO:tensorflow:epoch = 1.0, learning_rate = 5.000004e-06, loss = 0.0053756707, step = 7383 (1733.481 sec)
2025-01-09 13:27:23,069 [INFO] tensorflow: epoch = 1.0, learning_rate = 5.000004e-06, loss = 0.0053756707, step = 7383 (1733.481 sec)
2025-01-09 13:27:23,070 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 1/20: loss: 0.00538 learning rate: 0.00001 Time taken: 0:40:05.801975 ETA: 12:41:50.237516
2025-01-09 13:27:24,461 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.058
2025-01-09 13:27:26,667 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 45.345
INFO:tensorflow:epoch = 1.0078558851415413, learning_rate = 5.0912686e-06, loss = 0.005273092, step = 7441 (5.113 sec)
2025-01-09 13:27:28,182 [INFO] tensorflow: epoch = 1.0078558851415413, learning_rate = 5.0912686e-06, loss = 0.005273092, step = 7441 (5.113 sec)

Training log :
training.log (122.7 KB)

Can someone please take a look at the files and please let me know if I have done something wrong. I looked at other posts similar to this and followed their solutions but didn’t get anywhere, so I am not sure if I am missing something obvious.

Could you please use nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 to run?
You can
$docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash
then inside the docker, run below.
# detectnet_v2 train xxx

Also, please set enable_auto_resize: true in the spec file.

should i have to add the enable_auto_resize: true in augmentation config
because some error occured
ERROR:

Traceback (most recent call last):
  File "</usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/scripts/train.py>", line 3, in <module>
  File "<frozen iva.detectnet_v2.scripts.train>", line 1032, in <module>
  File "<frozen iva.detectnet_v2.scripts.train>", line 1011, in <module>
  File "<decorator-gen-117>", line 2, in main
  File "<frozen iva.detectnet_v2.utilities.timer>", line 46, in wrapped_fn
  File "<frozen iva.detectnet_v2.scripts.train>", line 994, in main
  File "<frozen iva.detectnet_v2.scripts.train>", line 787, in run_experiment
  File "<frozen iva.detectnet_v2.spec_handler.spec_loader>", line 125, in load_experiment_spec
  File "<frozen iva.detectnet_v2.spec_handler.spec_loader>", line 102, in load_proto
  File "<frozen iva.detectnet_v2.spec_handler.spec_loader>", line 88, in _load_from_file
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 725, in Merge
    allow_unknown_field=allow_unknown_field)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 793, in MergeLines
    return parser.MergeLines(lines, message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 818, in MergeLines
    self._ParseOrMerge(lines, message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 837, in _ParseOrMerge
    self._MergeField(tokenizer, message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 967, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 1042, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 934, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 47:3 : Message type "AugmentationConfig" has no field named "enable_auto_resize".
Execution status: FAIL

Although i had try to add enable_auto_resize: true in dataset_config but still the same error?

I tried with this enable_auto_resize: true in augmentation config

augmentation_config {
  preprocessing {
    output_image_width: 960
    output_image_height: 544
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    enable_auto_resize: true
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}

and with inside the container nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5
but still got the same error.

Labels example :

labels/Em_img_160724_5883.txt
heavy 0.00 0 0.00 511.00 0.00 597.00 46.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
two_wheeler 0.00 0 0.00 388.00 35.00 425.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
two_wheeler 0.00 0 0.00 32.00 11.00 76.00 60.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
four_wheeler 0.00 0 0.00 129.00 0.00 219.00 43.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

@Morganh could you please suggest the solution and tell me am i missing something ?

I will run with your shared data to try to reproduce. Could you reproduce and share the data?

This enable_auto_resize should be available in 4.0.1 docker. Please refer to 4.0.1 doc DetectNet_v2 - NVIDIA Docs as well.
Could you please double check?

Is it running when you tried ? because when i tried it is looking like model is training but actually it is not trained …giving Mean average_precision (in %): 0.0000

I am sharing the data :
data(images, labels).zip (717.6 KB)

So, there is not google.protobuf.text_format.ParseError: 47:3 : Message type "AugmentationConfig" has no field named "enable_auto_resize". now when you run with 4.0.1 docker, right?
Can you share your latest training log?

I did the training right now below are the logs,
training.log (122.7 KB)

these are the starting logs but after complete 1 epoch Mean average_precision (in %): 0.0000…

container name : nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5

Form your log, the training loss keeps decreasing. So, there is not issue during training.
But looks like there are problems in the evaluation.
How many test images? Did you ever save the tfrecord conversion log?

I find 2025-01-09 12:46:53,950 [INFO] __main__: Found 4807 samples in validation set.

Please share tfrecord conversion log if you save.
Also, please share several label files of validation dataset.

i did not pass test images as seperate folder but put the val_split = 14

kitti_config {
  root_directory_path: "/home/trainval/latest_training"
  image_dir_name: "images"
  label_dir_name: "labels"
  image_extension: ".jpg"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10 }

tfrecords generating command : tlt-dataset-convert -d /home/trainval/latest_training/kitty.txt -o /home/trainval/latest_training/tfrecords/ and container that i am using for it is nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3
tfrecords logs :
tfrecords.log (3.9 MB)

current folder structure is :

-/home/trainval/latest_training/
--/home/trainval/latest_training/images/ {all images with jpg extention}
--/home/trainval/latest_training/labels/ {all labels corresponding to images}
--/home/trainval/latest_training/tfrecords/ {tfrecords created with command "tlt-dataset-convert -d /home/trainval/latest_training/kitty.txt -o /home/trainval/latest_training/tfrecords/"} 
--/home/trainval/latest_training/model/resnet18_trafficcamnet.tlt {input model}
--/home/trainval/latest_training/model/weights/ {where final weights created}
--/home/trainval/latest_training/spec.txt 

spec.txt (6.4 KB)

process :

  1. create the tfrecords {“tlt-dataset-convert -d /home/trainval/latest_training/kitty.txt -o /home/trainval/latest_training/tfrecords/”}
  2. training : {“detectnet_v2 train -e spec.txt -r /home/trainval/latest_training/model -k tlt_encode”}

Can you run below?
$ ll -rltsh /home/trainval/latest_training/tfrecords/

ll -rltsh /home/trainval/latest_training/tfrecords/
total 20M
4.0K drwxr-xr-x 7 root root 4.0K Jan 13 06:50 ../
276K -rw-r--r-- 1 root root 274K Jan 13 06:50 -fold-000-of-002-shard-00000-of-00010
276K -rw-r--r-- 1 root root 274K Jan 13 06:50 -fold-000-of-002-shard-00001-of-00010
276K -rw-r--r-- 1 root root 276K Jan 13 06:50 -fold-000-of-002-shard-00002-of-00010
272K -rw-r--r-- 1 root root 271K Jan 13 06:50 -fold-000-of-002-shard-00003-of-00010
276K -rw-r--r-- 1 root root 276K Jan 13 06:50 -fold-000-of-002-shard-00004-of-00010
276K -rw-r--r-- 1 root root 275K Jan 13 06:50 -fold-000-of-002-shard-00005-of-00010
276K -rw-r--r-- 1 root root 275K Jan 13 06:50 -fold-000-of-002-shard-00006-of-00010
276K -rw-r--r-- 1 root root 275K Jan 13 06:50 -fold-000-of-002-shard-00007-of-00010
276K -rw-r--r-- 1 root root 274K Jan 13 06:50 -fold-000-of-002-shard-00008-of-00010
280K -rw-r--r-- 1 root root 278K Jan 13 06:50 -fold-000-of-002-shard-00009-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00000-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00001-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00002-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00003-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00004-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00005-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00006-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00007-of-00010
4.0K drwxr-xr-x 2 root root 4.0K Jan 13 06:50 ./
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00008-of-00010
1.7M -rw-r--r-- 1 root root 1.7M Jan 13 06:50 -fold-001-of-002-shard-00009-of-00010

Please set all the
minimum_bounding_box_height to lower, for example, 4.
minimum_height to lower, for example, 4.
minimum_width to lower, for example 4

Then run evaluation directly.
#detectnet_v2 evaluate xxx

Not need to run training.

root@4b832b42af2b:/home/trainval/latest_training# detectnet_v2 evaluate -e /home/trainval/latest_training/spec.txt -m /home/trainval/latest_training/model/model.step-0.tlt -k tlt_encode
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

2025-01-13 07:13:44,075 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/trainval/latest_training/spec.txt
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2025-01-13 07:13:44,082 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2025-01-13 07:13:44,406 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2025-01-13 07:13:44,419 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2025-01-13 07:13:44,442 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2025-01-13 07:13:45,056 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2025-01-13 07:13:45,056 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:186: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2025-01-13 07:13:45,056 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:186: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2025-01-13 07:13:47,205 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2025-01-13 07:13:47,205 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2025-01-13 07:13:47,710 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
2025-01-13 07:13:48,025 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2025-01-13 07:13:49,422 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2025-01-13 07:13:49,423 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2025-01-13 07:13:49,423 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2025-01-13 07:13:49,423 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 64, io threads: 128, compute threads: 64, buffered batches: 4
2025-01-13 07:13:49,423 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 4807, number of sources: 1, batch size per gpu: 4, steps: 1202
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2025-01-13 07:13:49,599 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f6c05d520f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f6c05d520f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2025-01-13 07:13:49,637 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f6c05d520f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f6c05d520f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2025-01-13 07:13:49,654 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2025-01-13 07:13:49,867 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2025-01-13 07:13:49,872 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2025-01-13 07:13:49,873 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f6bf948ea20>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f6bf948ea20>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2025-01-13 07:13:49,886 [WARNING] tensorflow: Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f6bf948ea20>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f6bf948ea20>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/types/images2d_reference.py:362: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

2025-01-13 07:13:49,908 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/types/images2d_reference.py:362: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

2025-01-13 07:13:50,103 [INFO] iva.detectnet_v2.evaluation.build_evaluator: Found 4807 samples in validation set
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2025-01-13 07:13:50,104 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2025-01-13 07:13:50,104 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2025-01-13 07:13:50,106 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/rasterizers/bbox_rasterizer.py:347: The name tf.bincount is deprecated. Please use tf.math.bincount instead.

2025-01-13 07:13:50,213 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/rasterizers/bbox_rasterizer.py:347: The name tf.bincount is deprecated. Please use tf.math.bincount instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_functions.py:17: The name tf.log is deprecated. Please use tf.math.log instead.

2025-01-13 07:13:50,535 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_functions.py:17: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:235: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2025-01-13 07:13:50,582 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:235: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 3, 544, 960)  0
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 272, 480) 9472        input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 272, 480) 256         conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 272, 480) 0           bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 136, 240) 0           block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 136, 240) 0           block_1a_bn_2[0][0]
                                                                 block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 136, 240) 0           add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 136, 240) 0           block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 136, 240) 0           block_1b_bn_2[0][0]
                                                                 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 136, 240) 0           add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 68, 120) 73856       block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 68, 120) 0           block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320        block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 68, 120) 0           block_2a_bn_2[0][0]
                                                                 block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 68, 120) 0           add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 68, 120) 147584      block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 68, 120) 0           block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 68, 120) 0           block_2b_bn_2[0][0]
                                                                 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 68, 120) 0           add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 34, 60)  295168      block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 34, 60)  0           block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60)  33024       block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 34, 60)  0           block_3a_bn_2[0][0]
                                                                 block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 34, 60)  0           add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 34, 60)  590080      block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 34, 60)  0           block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 34, 60)  0           block_3b_bn_2[0][0]
                                                                 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 34, 60)  0           add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 34, 60)  1180160     block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 34, 60)  0           block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60)  131584      block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 34, 60)  0           block_4a_bn_2[0][0]
                                                                 block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 34, 60)  0           add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 34, 60)  2359808     block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 34, 60)  0           block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 34, 60)  0           block_4b_bn_2[0][0]
                                                                 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 34, 60)  0           add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 16, 34, 60)   8208        block_4b_relu[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 4, 34, 60)    2052        block_4b_relu[0][0]
==================================================================================================
Total params: 11,205,588
Trainable params: 11,195,860
Non-trainable params: 9,728
__________________________________________________________________________________________________
WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:139: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.

2025-01-13 07:13:50,594 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:139: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

2025-01-13 07:13:50,595 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

2025-01-13 07:13:50,595 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

2025-01-13 07:13:50,595 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:140: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.

2025-01-13 07:13:50,596 [WARNING] tensorflow: From /opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:140: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.

INFO:tensorflow:Graph was finalized.
2025-01-13 07:13:50,884 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2025-01-13 07:13:51,207 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2025-01-13 07:13:51,519 [INFO] tensorflow: Done running local_init_op.
2025-01-13 07:13:52,105 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 1202, 0.00s/step
2025-01-13 07:14:22,067 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 1202, 3.00s/step
2025-01-13 07:14:42,903 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 1202, 2.08s/step
2025-01-13 07:15:03,329 [INFO] iva.detectnet_v2.evaluation.evaluation: step 30 / 1202, 2.04s/step
2025-01-13 07:15:23,924 [INFO] iva.detectnet_v2.evaluation.evaluation: step 40 / 1202, 2.06s/step
2025-01-13 07:15:44,527 [INFO] iva.detectnet_v2.evaluation.evaluation: step 50 / 1202, 2.06s/step

These are the logs but i think these are the same logs that came and after complete it Mean average_precision (in %): 0.0000 come again
but i run the evaluation on single epoch train weight

Did the evaluation finish? If not, please wait for it.

More, why your latest spec file does not contain enable_auto_resize:true? Since your dataset has various resolution, it is expected to set this parameter.
If you do not set, you can offline resize all your images to the same resolution, along with the labels.