For multi-GPU, change --gpus based on your machine. 2022-06-02 02:19:56,272 [INFO] root: Registry: ['nvcr.io'] 2022-06-02 02:19:56,516 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 2022-06-02 02:19:56,678 [WARNING] tlt.components.docker_handler.docker_handler: Docker will run the commands as root. If you would like to retain your local host permissions, please add the "user":"UID:GID" in the DockerOptions portion of the "/home/sysadmin/.tao_mounts.json" file. You can obtain your users UID and GID by using the "id -u" and "id -g" commands on the terminal. Using TensorFlow backend. WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. Using TensorFlow backend. [INFO] Loading specification from /workspace/tao-experiments/mask_rcnn/specs/maskrcnn_train_resnet50.txt INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpgagrb7d2', '_tf_random_seed': 123, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 1 inter_op_parallelism_threads: 4 gpu_options { allow_growth: true force_gpu_compatible: true } allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: TWO } } , '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} [MaskRCNN] INFO : Loading pretrained model... WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py:220: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py:223: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py:224: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead. [MaskRCNN] INFO : Create EncryptCheckpointSaverHook. [MaskRCNN] INFO : ================================= [MaskRCNN] INFO : Start training cycle 01 [MaskRCNN] INFO : ================================= WARNING:tensorflow:Entity ._prefetch_dataset at 0x7f68c39cb7b8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of ._prefetch_dataset at 0x7f68c39cb7b8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:349: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead. WARNING:tensorflow:Entity could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of . Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical. Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 222, in File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 218, in main File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 85, in run_executer File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py", line 399, in train_and_eval File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default input_fn, ModeKeys.TRAIN)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn self._call_input_fn(input_fn, mode)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1116, in _call_input_fn return input_fn(**kwargs) File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/dataloader/dataloader.py", line 141, in __call__ File "/opt/nvidia/third_party/keras/tensorflow_backend.py", line 356, in new_map self, _map_func_set_random_wrapper, num_parallel_calls=num_parallel_calls File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2004, in map self, map_func, num_parallel_calls, preserve_cardinality=False)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3569, in __init__ use_legacy_function=use_legacy_function) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2810, in __init__ self._function = wrapper_fn._get_concrete_function_internal() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1853, in _get_concrete_function_internal *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1847, in _get_concrete_function_internal_garbage_collected graph_function, _, _ = self._maybe_define_function(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2147, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2038, in _create_graph_function capture_by_value=self._capture_by_value), File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2804, in wrapper_fn ret = _wrapper_helper(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2749, in _wrapper_helper ret = autograph.tf_convert(func, ag_ctx)(*nested_args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in converted code: /opt/nvidia/third_party/keras/tensorflow_backend.py:353 _map_func_set_random_wrapper * return map_func(*args, **kwargs) /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/dataloader/dataloader_utils.py:193 dataset_parser /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/dataloader/dataloader_utils.py:392 process_gt_masks_for_training /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/ops/preprocess_ops.py:170 crop_gt_masks /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/array_ops.py:447 shape return shape_internal(input, name, optimize=True, out_type=out_type) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/array_ops.py:471 shape_internal input_tensor = ops.convert_to_tensor(input) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1184 convert_to_tensor return convert_to_tensor_v2(value, dtype, preferred_dtype, name) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1242 convert_to_tensor_v2 as_ref=False) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1297 internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:286 _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:227 constant allow_broadcast=True) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:265 _constant_impl allow_broadcast=allow_broadcast)) /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/tensor_util.py:437 make_tensor_proto raise ValueError("None values not supported.") ValueError: None values not supported. [MaskRCNN] ERROR : Job finished with an uncaught exception: `FAILURE` 2022-06-02 02:20:19,840 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.