ValueError: No such layer: block_1b_relu in unet nvidia-tlt 0.1.4

After upgrading nvidia-tlt I am no longer able to run a training of resnet-unet which used to work with previous version. when constructing the network, the command throws

Traceback (most recent call last):
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 419, in <module>
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 413, in main
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 314, in run_experiment
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 229, in train_unet
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 105, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/model_fn.py", line 121, in unet_fn
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/model/unet_model.py", line 104, in construct_model
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/model/resnet_unet.py", line 52, in construct_decoder_model
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 358, in get_layer
    raise ValueError('No such layer: ' + name)
ValueError: No such layer: block_1b_relu

How did you “upgrading nvidia-tlt” ?
Can you share $tlt info ?

And what is the command line and full log when you get this error?

pip3 install --upgrade nvidia-tlt

Configuration of the TLT Instance
dockers: [‘nvidia/tlt-streamanalytics’, ‘nvidia/tlt-pytorch’]
format_version: 1.0
tlt_version: 3.0
published_date: 04/16/2021

tlt unet train -k nvidia_tlt -r /new/media/hdd/datasets/mapillary/vistas/segmentation_training_2gpu -e /new/media/hdd/datasets/mapillary/vistas/experiment_config.cfg --use_amp

log_b7.txt (55.0 KB)

Can you download the latest pretrained models from NVIDIA NGC to check if helps?

tried, same result. actually I ran the command without a pretrained model as well, it is still same error

Can you use a new result folder?

log_b7.txt (57.9 KB)

To run with an interactive mode,

Can you run following command here?
$ tlt info --verbose

Can you directly pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 and try run inside it?

tlt.txt (819 Bytes)

tried from a container directly, same error

Can you share the training spec file?

ValueError: No such layer: block_1b_relu in unet nvidia-tlt 0.1.4 - #8 by ksokolov it should be in the log

I can reproduce with resnet10 pretrained model. There should be something wrong. I will check internally.

Please try to use other pretrained model instead.

I am able to run with resnet18, both in 1 and 2 gpu mode with batch 4. Will be waiting for resnet10 to start working to check it for the original issue as well.

is there any estimate to when the model would be available for training? I still hope to use a smaller one because of the requirements of my project

Sorry, this is a known issue in Unet. The fix will be available in next release.

how long would it take till next release? is it in terms of weeks or months?

Not sure yet. Maybe two months.