2022-04-20 14:10:52,989 [INFO] root: Registry: ['nvcr.io'] 2022-04-20 14:10:53,037 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 2022-04-20 14:10:53,087 [WARNING] tlt.components.docker_handler.docker_handler: Docker will run the commands as root. If you would like to retain your local host permissions, please add the "user":"UID:GID" in the DockerOptions portion of the "/home/inviol/.tao_mounts.json" file. You can obtain your users UID and GID by using the "id -u" and "id -g" commands on the terminal. Using TensorFlow backend. WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. Using TensorFlow backend. [INFO] Loading specification from /workspace/tao-experiments/mask_rcnn/specs/maskrcnn_retrain_resnet50.txt INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp3hjgb_1a', '_tf_random_seed': 123, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 1 inter_op_parallelism_threads: 4 gpu_options { allow_growth: true force_gpu_compatible: true } allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: TWO } } , '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} [MaskRCNN] INFO : Create EncryptCheckpointSaverHook. [MaskRCNN] INFO : ================================= [MaskRCNN] INFO : Start training cycle 01 [MaskRCNN] INFO : ================================= WARNING:tensorflow:Entity ._prefetch_dataset at 0x7f067da45d90> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of ._prefetch_dataset at 0x7f067da45d90>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:349: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead. WARNING:tensorflow:Entity could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of . Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. INFO:tensorflow:Calling model_fn. [MaskRCNN] INFO : *********************** [MaskRCNN] INFO : Loading model graph... [MaskRCNN] INFO : *********************** WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code [MaskRCNN] INFO : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_2/ [MaskRCNN] INFO : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_3/ [MaskRCNN] INFO : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_4/ [MaskRCNN] INFO : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_5/ [MaskRCNN] INFO : [ROI OPs] Using Batched NMS... Scope: MLP/multilevel_propose_rois/level_6/ WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 222, in File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 218, in main File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 85, in run_executer File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py", line 399, in train_and_eval File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py", line 686, in mask_rcnn_model_fn File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py", line 522, in _model_fn File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py", line 187, in build_model_graph File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/utils/model_loader.py", line 104, in get_model_with_input File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/model_config.py", line 96, in model_from_json return deserialize(config, custom_objects=custom_objects) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/layers/serialization.py", line 105, in deserialize printable_module_name='layer') File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/generic_utils.py", line 191, in deserialize_keras_object list(custom_objects.items()))) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1081, in from_config process_node(layer, node_data) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1039, in process_node layer(input_tensors, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__ outputs = call_fn(cast_inputs, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in converted code: /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/layers/reshape_layer.py:25 call /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/array_ops.py:131 reshape result = gen_array_ops.reshape(tensor, shape, name) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_array_ops.py:8115 reshape "Reshape", tensor=tensor, shape=shape, name=name) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py:794 _apply_op_helper op_def=op_def) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py:513 new_func return func(*args, **kwargs) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:3357 create_op attrs, op_def, compute_device) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:3426 _create_op_internal op_def=op_def) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1770 __init__ control_input_ops) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1610 _create_c_op raise ValueError(str(e)) ValueError: Cannot reshape a tensor with 25690112 elements to shape [128,256,14,14] (6422528 elements) for 'mask_head_reshape_1/mask_head_reshape_1' (op: 'Reshape') with input shapes: [4,128,256,14,14], [4] and with input tensors computed as partial shapes: input[1] = [128,256,14,14]. [MaskRCNN] ERROR : Job finished with an uncaught exception: `FAILURE` 2022-04-20 14:11:01,604 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.