Can't load pre-trained model for Retail Object Detection

• Hardware (T4/V100/Xavier/Nano/etc) - T4
• Network Type - EfficientDet-TF2
• TAO Version - toolkit_version: 4.0.1

dockers: 		
	nvidia/tao/tao-toolkit: 			
		4.0.0-tf2.9.1: 				
			docker_registry: nvcr.io
			tasks: 
				1. classification_tf2
				2. efficientdet_tf2

• Training spec file(If have, please share here)

data:
  loader:
    prefetch_size: 4
    shuffle_file: True
  num_classes: 97
  image_size: '416x416'
  max_instances_per_image: 10
  train_tfrecords:
    - '/workspace/tao-experiments/data/train/tf_records/train-*'
  val_tfrecords:
    - '/workspace/tao-experiments/data/val/tf_records/val-*'
  val_json_file: '/workspace/tao-experiments/data/val/annotations.json'
train:
  checkpoint: "/workspace/tao-experiments/data/efficientdet_tf2/retail_detector_100.tlt"
  num_examples_per_epoch: 1000
model:
  name: 'efficientdet-d5'
key: 'nvidia_tlt'
results_dir: '/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned'**

• How to reproduce the issue? (This is for errors. Please share the command line and the detailed log here.)
Run tao efficientdet_tf2 train -e $SPECS_DIR/spec_train.yaml --gpus 1

Hi!

I’m trying to fine-tune this model.
I was able to convert my COCO dataset to tfrecords using tao, but when I’m trying to run training, I’m getting the following output:

2023-04-11 23:43:22,078 [INFO] root: Registry: ['nvcr.io']
2023-04-11 23:43:22,137 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1
2023-04-11 23:43:27.981470: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
[1681256610.283774] [ip-172-31-7-160:16   :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
2023-04-11 23:43:30,643 [WARNING] matplotlib: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-hxa8n1c4 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2023-04-11 23:43:30,906 [INFO] matplotlib.font_manager: generated new fontManager
[1681256615.247612] [ip-172-31-7-160:324  :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
<frozen common.hydra.hydra_runner>:87: UserWarning: 
'spec_train.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Setting up communication with ClearML server.
ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
Training will still continue.
Log file already exists at /workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/status.json
Starting efficientdet training.
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f2171ef70> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f2171ef70>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f2171ef70> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f2171ef70>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f2282d1f0> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f2282d1f0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f2282d1f0> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f2282d1f0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
target_size = (416, 416), output_size = (416, 416)
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5040> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5040>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5040> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5040>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f203d5310> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f203d5310>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f203d5310> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7f0f203d5310>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5430> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5430>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5430> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5430>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5550> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5550>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5550> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7f0f203d5550>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Building unpruned graph...
WARNING:tensorflow:AutoGraph could not transform <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7f0e90644e80>> and will run it as-is.
Cause: Unable to locate the source code of <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7f0e90644e80>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7f0e90644e80>> and will run it as-is.
Cause: Unable to locate the source code of <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7f0e90644e80>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7f0f2003c880>> and will run it as-is.
Cause: Unable to locate the source code of <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7f0f2003c880>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7f0f2003c880>> and will run it as-is.
Cause: Unable to locate the source code of <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7f0f2003c880>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
/usr/local/lib/python3.8/dist-packages/keras/backend.py:450: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
"The indicated 'retail_detector_100.tlt' artifact does not exist in the '/workspace/tao-experiments/data/efficientdet_tf2/retail_detector_100.tlt' registry"
Error executing job with overrides: []
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 368, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/clearml/binding/hydra_bind.py", line 88, in _patched_hydra_run
    return PatchHydra._original_hydra_run(self, config_name, task_function, overrides, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 110, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/usr/local/lib/python3.8/dist-packages/clearml/binding/hydra_bind.py", line 170, in _patched_task_function
    return task_function(a_config, *a_args, **a_kwargs)
  File "<frozen cv.efficientdet.scripts.train>", line 229, in main
  File "<frozen common.decorators>", line 76, in _func
  File "<frozen common.decorators>", line 49, in _func
  File "<frozen cv.efficientdet.scripts.train>", line 108, in run_experiment
  File "<frozen cv.efficientdet.utils.helper>", line 61, in decode_eff
  File "<frozen eff.core.archive>", line 544, in restore_artifact
KeyError: "The indicated 'retail_detector_100.tlt' artifact does not exist in the '/workspace/tao-experiments/data/efficientdet_tf2/retail_detector_100.tlt' registry"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_tf2/cv/efficientdet/scripts/train.py>", line 3, in <module>
  File "<frozen cv.efficientdet.scripts.train>", line 233, in <module>
  File "<frozen common.hydra.hydra_runner>", line 87, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 367, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 251, in run_and_report
    assert mdl is not None
AssertionError
Sending telemetry data.
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
2023-04-11 23:43:53,561 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

It looks like it can’t load pre-trained .tlt model, but I don’t know why (I checked if the file is present in docker, and the answer is yes).

Do you have any suggestions?

Can you share the result when you run below?
$ tao efficientdet_tf2 run ls /workspace/tao-experiments/data/efficientdet_tf2/retail_detector_100.tlt

It returns the path to the .tlt file, so it’s in the container, under the given path.

I don’t think it can’t find the file, because when I removed the file the error was different.

Please try to change key from nvidia_tlt to nvidia_tao .

I’ve tried this one and some others (I’ve gathered some random keys from other pre-trained models) but none of them work.

Could you please try this model Retail Object Detection | NVIDIA NGC
$ wget ‘https://api.ngc.nvidia.com/v2/models/nvidia/tao/retail_object_detection/versions/trainable_binary_v1.0/files/retail_detector_binary.tlt

I’ve tried it, but I’m getting the same error :(

2023-04-13 14:41:13,532 [INFO] root: Registry: ['nvcr.io']
2023-04-13 14:41:13,593 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1
2023-04-13 14:41:21.385968: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
[1681396888.916641] [ip-172-31-7-160:16   :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
2023-04-13 14:41:31,538 [WARNING] matplotlib: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lzgp4nnc because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2023-04-13 14:41:32,032 [INFO] matplotlib.font_manager: generated new fontManager
[1681396897.225256] [ip-172-31-7-160:324  :f]        vfs_fuse.c:424  UCX  WARN  failed to connect to vfs socket '': Invalid argument
<frozen common.hydra.hydra_runner>:87: UserWarning: 
'spec_train.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Setting up communication with ClearML server.
ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
Training will still continue.
Log file already exists at /workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned/status.json
Starting efficientdet training.
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe82f83f70> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe82f83f70>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe82f83f70> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe82f83f70>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe840901f0> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe840901f0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe840901f0> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe840901f0>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
target_size = (416, 416), output_size = (416, 416)
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c040> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c040>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c040> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c040>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe8043c310> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe8043c310>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe8043c310> and will run it as-is.
Cause: Unable to locate the source code of <function CocoDataset.__call__.<locals>._prefetch_dataset at 0x7efe8043c310>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c430> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c430>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c430> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c430>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c550> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c550>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c550> and will run it as-is.
Cause: could not parse the source code of <function CocoDataset.__call__.<locals>.<lambda> at 0x7efe8043c550>: no matching AST found among candidates:

To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Building unpruned graph...
WARNING:tensorflow:AutoGraph could not transform <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7efe082519d0>> and will run it as-is.
Cause: Unable to locate the source code of <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7efe082519d0>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7efe082519d0>> and will run it as-is.
Cause: Unable to locate the source code of <bound method ImageResizeLayer.call of <nvidia_tao_tf2.cv.efficientdet.layers.image_resize_layer.ImageResizeLayer object at 0x7efe082519d0>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7efe8009e970>> and will run it as-is.
Cause: Unable to locate the source code of <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7efe8009e970>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
AutoGraph could not transform <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7efe8009e970>> and will run it as-is.
Cause: Unable to locate the source code of <bound method WeightedFusion.call of <nvidia_tao_tf2.cv.efficientdet.layers.weighted_fusion_layer.WeightedFusion object at 0x7efe8009e970>>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
/usr/local/lib/python3.8/dist-packages/keras/backend.py:450: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
"The indicated 'retail_detector_binary.tlt' artifact does not exist in the '/workspace/tao-experiments/data/efficientdet_tf2/retail_detector_binary.tlt' registry"
Error executing job with overrides: []
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 368, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/clearml/binding/hydra_bind.py", line 88, in _patched_hydra_run
    return PatchHydra._original_hydra_run(self, config_name, task_function, overrides, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 110, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/usr/local/lib/python3.8/dist-packages/clearml/binding/hydra_bind.py", line 170, in _patched_task_function
    return task_function(a_config, *a_args, **a_kwargs)
  File "<frozen cv.efficientdet.scripts.train>", line 229, in main
  File "<frozen common.decorators>", line 76, in _func
  File "<frozen common.decorators>", line 49, in _func
  File "<frozen cv.efficientdet.scripts.train>", line 108, in run_experiment
  File "<frozen cv.efficientdet.utils.helper>", line 61, in decode_eff
  File "<frozen eff.core.archive>", line 544, in restore_artifact
KeyError: "The indicated 'retail_detector_binary.tlt' artifact does not exist in the '/workspace/tao-experiments/data/efficientdet_tf2/retail_detector_binary.tlt' registry"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_tf2/cv/efficientdet/scripts/train.py>", line 3, in <module>
  File "<frozen cv.efficientdet.scripts.train>", line 233, in <module>
  File "<frozen common.hydra.hydra_runner>", line 87, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 367, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 251, in run_and_report
    assert mdl is not None
AssertionError
Sending telemetry data.
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
2023-04-13 14:41:59,283 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Config:

data:
  loader:
    prefetch_size: 4
    shuffle_file: True
  num_classes: 97
  image_size: '416x416'
  max_instances_per_image: 10
  train_tfrecords:
    - '/workspace/tao-experiments/data/train/tf_records/train-*'
  val_tfrecords:
    - '/workspace/tao-experiments/data/val/tf_records/val-*'
  val_json_file: '/workspace/tao-experiments/data/val/annotations.json'
train:
  num_examples_per_epoch: 1000
  batch_size: 1
  checkpoint: '/workspace/tao-experiments/data/efficientdet_tf2/retail_detector_binary.tlt'
model:
  name: 'efficientdet-d5'
key: 'nvidia_tao'
results_dir: '/workspace/tao-experiments/efficientdet_tf2/experiment_dir_unpruned'

ls output:

$ tao efficientdet_tf2 run ls /workspace/tao-experiments/data/efficientdet_tf2
2023-04-13 14:40:54,766 [INFO] root: Registry: ['nvcr.io']
2023-04-13 14:40:54,823 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1
retail_detector_100.tlt         retail_detector_binary.tlt
retail_detector_100_int8.txt    retail_object_detection_vdeployable_100_v1.0
retail_detector_100_labels.txt
2023-04-13 14:40:59,592 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

After checking, please change as below.
For binary retail model
Retail Object Detection | NVIDIA NGC please rename it to efficientdet-d5_038.tlt

And for 100-class retail model
Retail Object Detection | NVIDIA NGC , please rename it to efficientdet-d5_046.tlt

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.