Run TAO training using unet.ipynb in Jupyter Notebook failed

uccc12926 · July 14, 2022, 6:59am

I’m running open model architecture-UNET using Jupyter Notebook-unet/unet.ipynb.
When I run the “tao unet train” cell, it failed, details show below.
Why may this happened?
How can I fix it?

For multi-GPU, change --gpus based on your machine.
2022-07-14 14:31:01,359 [INFO] root: Registry: ['nvcr.io']
2022-07-14 14:31:01,486 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/checkpoint_saver_hook.py:21: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.WARN is deprecated. Please use tf.compat.v1.logging.WARN instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py:405: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

Loading experiment spec at /home/ubuntu/Desktop/new_model/tao-experiment/unet/specs/unet_train_resnet_unet_isbi.txt.
2022-07-14 06:31:07,981 [INFO] __main__: Loading experiment spec at /home/ubuntu/Desktop/new_model/tao-experiment/unet/specs/unet_train_resnet_unet_isbi.txt.
2022-07-14 06:31:07,983 [INFO] iva.unet.spec_handler.spec_loader: Merging specification from /home/ubuntu/Desktop/new_model/tao-experiment/unet/specs/unet_train_resnet_unet_isbi.txt
2022-07-14 06:31:07,985 [INFO] root: Initializing the pre-trained weights from /home/ubuntu/Desktop/new_model/tao-experiment/unet/pretrained_resnet18/pretrained_semantic_segmentation_vresnet18/resnet_18.hdf5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2022-07-14 06:31:07,988 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2022-07-14 06:31:07,999 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2022-07-14 06:31:08,020 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

2022-07-14 06:31:08,025 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2022-07-14 06:31:08,864 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2022-07-14 06:31:09,066 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2022-07-14 06:31:09,066 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2022-07-14 06:31:09,234 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2022-07-14 06:31:09,716 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

2022-07-14 06:31:09,735 [INFO] iva.unet.model.utilities: Label Id 0: Train Id 0
2022-07-14 06:31:09,735 [INFO] iva.unet.model.utilities: Label Id 1: Train Id 1
INFO:tensorflow:Using config: {'_model_dir': '/home/ubuntu/Desktop/new_model/tao-experiment/unet/isbi_experiment_unpruned', '_tf_random_seed': None, '_save_summary_steps': 5, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 1
inter_op_parallelism_threads: 38
gpu_options {
  allow_growth: true
  visible_device_list: "0"
  force_gpu_compatible: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f52fe8ca908>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2022-07-14 06:31:09,736 [INFO] tensorflow: Using config: {'_model_dir': '/home/ubuntu/Desktop/new_model/tao-experiment/unet/isbi_experiment_unpruned', '_tf_random_seed': None, '_save_summary_steps': 5, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 1
inter_op_parallelism_threads: 38
gpu_options {
  allow_growth: true
  visible_device_list: "0"
  force_gpu_compatible: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f52fe8ca908>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Phase train: Total 20 files.
2022-07-14 06:31:09,747 [INFO] iva.unet.model.utilities: The total number of training samples 20 and the batch size per                 GPU 3
2022-07-14 06:31:09,747 [INFO] iva.unet.model.utilities: Cannot iterate over exactly 20 samples with a batch size of 3; each epoch will therefore take one extra step.
2022-07-14 06:31:09,747 [INFO] iva.unet.model.utilities: Steps per epoch taken: 7
Running for 50 Epochs
2022-07-14 06:31:09,747 [INFO] __main__: Running for 50 Epochs
INFO:tensorflow:Create CheckpointSaverHook.
2022-07-14 06:31:09,747 [INFO] tensorflow: Create CheckpointSaverHook.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2022-07-14 06:31:10,563 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:Entity <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,616 [WARNING] tensorflow: Entity <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f528ff46e18> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f528ff46e18>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,636 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f528ff46e18> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f528ff46e18>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,644 [WARNING] tensorflow: Entity <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,654 [WARNING] tensorflow: Entity <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,663 [WARNING] tensorflow: Entity <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/data_loader.py:414: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

2022-07-14 06:31:10,663 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/data_loader.py:414: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397d2f0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397d2f0>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,676 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397d2f0> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397d2f0>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dbf8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dbf8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,684 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dbf8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dbf8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,692 [WARNING] tensorflow: Entity <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f52fe8ca860>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dd08> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dd08>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-07-14 06:31:10,707 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dd08> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f515397dd08>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
INFO:tensorflow:Calling model_fn.
2022-07-14 06:31:10,730 [INFO] tensorflow: Calling model_fn.
2022-07-14 06:31:10,730 [INFO] iva.unet.utils.model_fn: {'exec_mode': 'train', 'model_dir': '/home/ubuntu/Desktop/new_model/tao-experiment/unet/isbi_experiment_unpruned', 'resize_padding': False, 'resize_method': 'BILINEAR', 'log_dir': None, 'batch_size': 3, 'learning_rate': 9.999999747378752e-05, 'crossvalidation_idx': None, 'max_steps': None, 'regularizer_type': 2, 'weight_decay': 1.9999999494757503e-05, 'log_summary_steps': 10, 'warmup_steps': 0, 'augment': False, 'use_amp': False, 'use_trt': False, 'use_xla': False, 'loss': 'cross_dice_sum', 'epochs': 50, 'pretrained_weights_file': None, 'unet_model': <iva.unet.model.resnet_unet.ResnetUnet object at 0x7f528ff77e80>, 'key': 'nvidia_tlt', 'experiment_spec': random_seed: 42
dataset_config {
  dataset: "custom"
  input_image_type: "grayscale"
  train_images_path: "/home/ubuntu/Desktop/new_model/tao-experiment/data/isbi/images/train"
  train_masks_path: "/home/ubuntu/Desktop/new_model/tao-experiment/data/isbi/masks/train"
  val_images_path: "/home/ubuntu/Desktop/new_model/tao-experiment/data/isbi/images/val"
  val_masks_path: "/home/ubuntu/Desktop/new_model/tao-experiment/data/isbi/masks/val"
  test_images_path: "/home/ubuntu/Desktop/new_model/tao-experiment/data/isbi/images/test"
  data_class_config {
    target_classes {
      name: "foreground"
      mapping_class: "foreground"
    }
    target_classes {
      name: "background"
      label_id: 1
      mapping_class: "background"
    }
  }
  augmentation_config {
    spatial_augmentation {
      hflip_probability: 0.5
      vflip_probability: 0.5
      crop_and_resize_prob: 0.5
    }
    brightness_augmentation {
      delta: 0.20000000298023224
    }
  }
}
model_config {
  num_layers: 18
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
  all_projections: true
  model_input_height: 320
  model_input_width: 320
  model_input_channels: 1
}
training_config {
  batch_size: 3
  regularizer {
    type: L2
    weight: 1.9999999494757503e-05
  }
  optimizer {
    adam {
      epsilon: 9.99999993922529e-09
      beta1: 0.8999999761581421
      beta2: 0.9990000128746033
    }
  }
  checkpoint_interval: 1
  log_summary_steps: 10
  learning_rate: 9.999999747378752e-05
  loss: "cross_dice_sum"
  epochs: 50
}
, 'seed': 42, 'benchmark': False, 'temp_dir': '/tmp/tmpy3xdbl94', 'num_classes': 2, 'start_step': 0, 'checkpoint_interval': 1, 'model_json': None, 'load_graph': False, 'phase': None}
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 1, 320, 320)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 160, 160) 3200        input_1[0][0]                    
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 160, 160) 0           conv1[0][0]                      
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 80, 80)   36928       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 80, 80)   0           block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 80, 80)   36928       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 80, 80)   4160        activation_1[0][0]               
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 80, 80)   0           block_1a_conv_2[0][0]            
                                                                 block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 80, 80)   0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 80, 80)   36928       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 80, 80)   0           block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 80, 80)   36928       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 80, 80)   4160        block_1a_relu[0][0]              
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 80, 80)   0           block_1b_conv_2[0][0]            
                                                                 block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 80, 80)   0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 40, 40)  73856       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 40, 40)  0           block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 40, 40)  147584      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 40, 40)  8320        block_1b_relu[0][0]              
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 40, 40)  0           block_2a_conv_2[0][0]            
                                                                 block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 40, 40)  0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 40, 40)  147584      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 40, 40)  0           block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 40, 40)  147584      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 40, 40)  16512       block_2a_relu[0][0]              
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 40, 40)  0           block_2b_conv_2[0][0]            
                                                                 block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 40, 40)  0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 20, 20)  295168      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 20, 20)  0           block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 20, 20)  590080      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 20, 20)  33024       block_2b_relu[0][0]              
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 20, 20)  0           block_3a_conv_2[0][0]            
                                                                 block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 20, 20)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 20, 20)  590080      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 20, 20)  0           block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 20, 20)  590080      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 20, 20)  65792       block_3a_relu[0][0]              
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 20, 20)  0           block_3b_conv_2[0][0]            
                                                                 block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 20, 20)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 20, 20)  1180160     block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 20, 20)  0           block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 20, 20)  2359808     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 20, 20)  131584      block_3b_relu[0][0]              
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 20, 20)  0           block_4a_conv_2[0][0]            
                                                                 block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 20, 20)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 20, 20)  2359808     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 20, 20)  0           block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 20, 20)  2359808     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 20, 20)  262656      block_4a_relu[0][0]              
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 20, 20)  0           block_4b_conv_2[0][0]            
                                                                 block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 20, 20)  0           add_8[0][0]                      
__________________________________________________________________________________________________
conv2d_transpose_1 (Conv2DTrans (None, 256, 40, 40)  2097408     block_4b_relu[0][0]              
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 384, 40, 40)  0           conv2d_transpose_1[0][0]         
                                                                 block_2b_relu[0][0]              
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 384, 40, 40)  0           concatenate_1[0][0]              
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 256, 40, 40)  884992      activation_2[0][0]               
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 256, 40, 40)  0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
conv2d_transpose_2 (Conv2DTrans (None, 128, 80, 80)  524416      activation_3[0][0]               
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 192, 80, 80)  0           conv2d_transpose_2[0][0]         
                                                                 block_1b_relu[0][0]              
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 192, 80, 80)  0           concatenate_2[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 128, 80, 80)  221312      activation_4[0][0]               
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 128, 80, 80)  0           conv2d_2[0][0]                   
__________________________________________________________________________________________________
conv2d_transpose_3 (Conv2DTrans (None, 64, 160, 160) 131136      activation_5[0][0]               
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 128, 160, 160 0           conv2d_transpose_3[0][0]         
                                                                 activation_1[0][0]               
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 128, 160, 160 0           concatenate_3[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 64, 160, 160) 73792       activation_6[0][0]               
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 64, 160, 160) 0           conv2d_3[0][0]                   
__________________________________________________________________________________________________
conv2d_transpose_4 (Conv2DTrans (None, 64, 320, 320) 65600       activation_7[0][0]               
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 64, 320, 320) 0           conv2d_transpose_4[0][0]         
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 64, 320, 320) 36928       activation_8[0][0]               
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 64, 320, 320) 0           conv2d_4[0][0]                   
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 2, 320, 320)  1154        activation_9[0][0]               
==================================================================================================
Total params: 15,555,458
Trainable params: 15,555,458
Non-trainable params: 0
__________________________________________________________________________________________________
2022-07-14 14:31:11,315 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

NVES · July 14, 2022, 7:37am

Hi,

This looks like a TAO Toolkit related issue. We will move this post to the TAO Toolkit forum.

Thanks!

Morganh · July 17, 2022, 2:55pm

Could you please run below to narrow down?
$ tao unet run /bin/bash
then inside the docker,
# unet train xxx

yingliu · August 1, 2022, 9:04am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Topic		Replies	Views
The training process of Tao-Toolkit-API unet is always in Inf status TAO Toolkit api , tao	61	2839	June 12, 2023
Cannot train Tao Toolkit UNet model in version v4.0.0 and v4.0.1 TAO Toolkit tao	16	923	July 13, 2023
Unet_isbi notebook fails at the train instruction TAO Toolkit	2	434	January 25, 2022
Unet with TAO TAO Toolkit	2	233	April 3, 2024
TAO Toolkit trainung Unet stops when saving checkpoints TAO Toolkit	19	298	September 3, 2024
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	272	October 30, 2024
TAO unet producing nan values TAO Toolkit	5	1024	April 21, 2022
Training multi-class UNet does not converge TAO Toolkit	31	3415	October 12, 2021
Problem in training unet TAO Toolkit	22	2116	October 12, 2021
Tao GestureNet train do not work properly TAO Toolkit	2	725	December 9, 2021

Run TAO training using unet.ipynb in Jupyter Notebook failed

Related topics