ValueError: No such layer: block_1b_relu in unet nvidia-tlt 0.1.4

ksokolov · June 15, 2021, 7:58am

After upgrading nvidia-tlt I am no longer able to run a training of resnet-unet which used to work with previous version. when constructing the network, the command throws

Traceback (most recent call last):
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 419, in <module>
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 413, in main
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 314, in run_experiment
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 229, in train_unet
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 105, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/model_fn.py", line 121, in unet_fn
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/model/unet_model.py", line 104, in construct_model
  File "/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/model/resnet_unet.py", line 52, in construct_decoder_model
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 358, in get_layer
    raise ValueError('No such layer: ' + name)
ValueError: No such layer: block_1b_relu

Morganh · June 15, 2021, 8:06am

How did you “upgrading nvidia-tlt” ?
Can you share $tlt info ?

And what is the command line and full log when you get this error?

ksokolov · June 15, 2021, 8:07am

pip3 install --upgrade nvidia-tlt

Configuration of the TLT Instance
dockers: [‘nvidia/tlt-streamanalytics’, ‘nvidia/tlt-pytorch’]
format_version: 1.0
tlt_version: 3.0
published_date: 04/16/2021

ksokolov · June 15, 2021, 8:09am

tlt unet train -k nvidia_tlt -r /new/media/hdd/datasets/mapillary/vistas/segmentation_training_2gpu -e /new/media/hdd/datasets/mapillary/vistas/experiment_config.cfg --use_amp

log_b7.txt (55.0 KB)

Morganh · June 15, 2021, 8:15am

Can you download the latest pretrained models from https://ngc.nvidia.com/catalog/models/nvidia:tlt_semantic_segmentation/files to check if helps?

ksokolov · June 15, 2021, 8:31am

tried, same result. actually I ran the command without a pretrained model as well, it is still same error

Morganh · June 15, 2021, 8:32am

Can you use a new result folder?

ksokolov · June 15, 2021, 8:36am

log_b7.txt (57.9 KB)

Morganh · June 15, 2021, 11:09am

To run with an interactive mode,

docker run --runtime=nvidia -it -v yourfolder:/workspace/ nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 /bin/bash
Then,
# unet train xxx xxx …

Morganh · June 15, 2021, 11:23am

Can you run following command here?
$ tlt info --verbose

Can you directly pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3 and try run inside it?

ksokolov · June 15, 2021, 11:37am

tlt.txt (819 Bytes)

ksokolov · June 15, 2021, 11:42am

tried from a container directly, same error

Morganh · June 15, 2021, 12:38pm

Can you share the training spec file?

ksokolov · June 15, 2021, 12:42pm

ValueError: No such layer: block_1b_relu in unet nvidia-tlt 0.1.4 - #8 by ksokolov it should be in the log

Morganh · June 15, 2021, 1:08pm

I can reproduce with resnet10 pretrained model. There should be something wrong. I will check internally.

Please try to use other pretrained model instead.

ksokolov · June 15, 2021, 1:45pm

I am able to run with resnet18, both in 1 and 2 gpu mode with batch 4. Will be waiting for resnet10 to start working to check it for the original issue as well.

ksokolov · June 21, 2021, 10:48am

is there any estimate to when the model would be available for training? I still hope to use a smaller one because of the requirements of my project

Morganh · June 21, 2021, 11:13am

Sorry, this is a known issue in Unet. The fix will be available in next release.

ksokolov · June 21, 2021, 12:05pm

how long would it take till next release? is it in terms of weeks or months?

Morganh · June 21, 2021, 1:01pm

Not sure yet. Maybe two months.

Topic		Replies	Views
Error wile using TLT pretrained model tlt_semantic_segmentation:resnet101 TAO Toolkit	7	625	August 27, 2021
NameError: name 'unet' is not defined DirectX, DXR, DirectCompute	0	1504	February 27, 2021
Training UNET with TLT TAO Toolkit	9	663	October 12, 2021
Tlt unet evaluate failed TAO Toolkit	10	541	September 18, 2021
ValueError: need more than 1 value to unpack TAO Toolkit	13	939	October 12, 2021
Tlt-train with ssd is not working on the latest container (December 29, 2020) TAO Toolkit	9	612	October 12, 2021
Error training Faster RCNN model TAO Toolkit	17	1606	October 12, 2021
Segmentation with unet : AssertionError: Freeze blocks is only possible if a pretrained model file is provided TAO Toolkit	7	679	October 12, 2021
Error while training on tlt TAO Toolkit	4	744	September 5, 2021
Segmentation with unet : shape error TAO Toolkit	8	1569	October 12, 2021

ValueError: No such layer: block_1b_relu in unet nvidia-tlt 0.1.4

Related topics