I could run with multiple GPUs but got new error message: tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
INFO:tensorflow:Done calling model_fn.
2023-04-18 07:01:10,067 [INFO] tensorflow: Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2023-04-18 07:01:10,419 [INFO] tensorflow: Graph was finalized.
2023-04-18 07:01:10,420 [INFO] root: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
50da6c6f-0e2e-41cf-bd6a-260f5e8bd32a-dhrg8:66:246 [0] NCCL INFO comm 0x7fa00c469820 rank 0 nranks 4 cudaDev 0 busId 60 - Destroy COMPLETE
INFO:tensorflow:Done calling model_fn.
2023-04-18 07:01:10,496 [INFO] tensorflow: Done calling model_fn.
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/unet/scripts/train.py>", line 3, in <module>
File "<frozen iva.unet.scripts.train>", line 533, in <module>
File "<frozen iva.unet.scripts.train>", line 529, in main
File "<frozen iva.unet.scripts.train>", line 516, in main
File "<frozen iva.unet.scripts.train>", line 387, in run_experiment
File "<frozen iva.unet.scripts.evaluate>", line 323, in evaluate_unet
File "<frozen iva.unet.scripts.evaluate>", line 228, in run_evaluate_tlt
File "<frozen iva.unet.scripts.evaluate>", line 138, in print_compute_metrics
File "<frozen iva.unet.scripts.evaluate>", line 81, in compute_metrics_masks
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 955, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 638, in predict
hooks=all_hooks) as mon_sess:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 647, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 290, in prepare_session
config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 194, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1585, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 699, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
INFO:tensorflow:Done calling model_fn.
2023-04-18 07:01:10,580 [INFO] tensorflow: Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2023-04-18 07:01:10,831 [INFO] tensorflow: Graph was finalized.
2023-04-18 07:01:10,832 [INFO] root: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
INFO:tensorflow:Graph was finalized.
2023-04-18 07:01:10,926 [INFO] tensorflow: Graph was finalized.
2023-04-18 07:01:10,926 [INFO] root: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/unet/scripts/train.py>", line 3, in <module>
File "<frozen iva.unet.scripts.train>", line 533, in <module>
File "<frozen iva.unet.scripts.train>", line 529, in main
File "<frozen iva.unet.scripts.train>", line 516, in main
File "<frozen iva.unet.scripts.train>", line 387, in run_experiment
File "<frozen iva.unet.scripts.evaluate>", line 323, in evaluate_unet
File "<frozen iva.unet.scripts.evaluate>", line 228, in run_evaluate_tlt
File "<frozen iva.unet.scripts.evaluate>", line 138, in print_compute_metrics
File "<frozen iva.unet.scripts.evaluate>", line 81, in compute_metrics_masks
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 955, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 638, in predict
hooks=all_hooks) as mon_sess:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 647, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 290, in prepare_session
config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 194, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1585, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 699, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/unet/scripts/train.py>", line 3, in <module>
File "<frozen iva.unet.scripts.train>", line 533, in <module>
File "<frozen iva.unet.scripts.train>", line 529, in main
File "<frozen iva.unet.scripts.train>", line 516, in main
File "<frozen iva.unet.scripts.train>", line 387, in run_experiment
File "<frozen iva.unet.scripts.evaluate>", line 323, in evaluate_unet
File "<frozen iva.unet.scripts.evaluate>", line 228, in run_evaluate_tlt
File "<frozen iva.unet.scripts.evaluate>", line 138, in print_compute_metrics
File "<frozen iva.unet.scripts.evaluate>", line 81, in compute_metrics_masks
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 955, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 638, in predict
hooks=all_hooks) as mon_sess:
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 647, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 290, in prepare_session
config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 194, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1585, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 699, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
model.ckpt-100.meta
INFO:tensorflow:Using config: {'_model_dir': '/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/58d3d7e4-9886-4995-8d29-1fb280a59108/1a8c03ef-afcd-4eea-b0b9-6361551255bf/experiment_0/weights', '_tf_random_seed': None, '_save_summary_steps': 1, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': gpu_options {
}
allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9ff81ceef0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2023-04-18 07:01:11,345 [INFO] tensorflow: Using config: {'_model_dir': '/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/58d3d7e4-9886-4995-8d29-1fb280a59108/1a8c03ef-afcd-4eea-b0b9-6361551255bf/experiment_0/weights', '_tf_random_seed': None, '_save_summary_steps': 1, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': gpu_options {
}
allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9ff81ceef0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2023-04-18 07:01:11,347 [INFO] iva.unet.scripts.evaluate: Starting Evaluation.
0it [00:00, ?it/s]WARNING:tensorflow:Entity <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,367 [WARNING] tensorflow: Entity <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.read_image_and_label_tensors of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea9e429d8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea9e429d8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,382 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea9e429d8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea9e429d8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,392 [WARNING] tensorflow: Entity <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.rgb_to_bgr_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,401 [WARNING] tensorflow: Entity <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.cast_img_lbl_dtype_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,410 [WARNING] tensorflow: Entity <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.resize_image_and_label_tf of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea82158c8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea82158c8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,557 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea82158c8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea82158c8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea8215b70> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea8215b70>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,566 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea8215b70> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7f9ea8215b70>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,575 [WARNING] tensorflow: Entity <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset.transpose_to_nchw of <iva.unet.utils.data_loader.Dataset object at 0x7f9ffb4a00f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e5777b8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e5777b8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,586 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e5777b8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e5777b8>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e577a60> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e577a60>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2023-04-18 07:01:11,603 [WARNING] tensorflow: Entity <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e577a60> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <function Dataset.input_fn_aigs_tf.<locals>.<lambda> at 0x7fa12e577a60>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
INFO:tensorflow:Calling model_fn.
2023-04-18 07:01:11,614 [INFO] tensorflow: Calling model_fn.
2023-04-18 07:01:11,614 [INFO] iva.unet.utils.model_fn: {'exec_mode': 'train', 'model_dir': '/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/58d3d7e4-9886-4995-8d29-1fb280a59108/1a8c03ef-afcd-4eea-b0b9-6361551255bf/experiment_0/weights', 'resize_padding': True, 'resize_method': 'BILINEAR', 'log_dir': None, 'batch_size': 3, 'learning_rate': 0.00040598164196126163, 'activation': 'softmax', 'crossvalidation_idx': None, 'max_steps': None, 'regularizer_type': 1, 'weight_decay': 0.0029441313818097115, 'log_summary_steps': 10, 'warmup_steps': 0, 'augment': False, 'use_amp': False, 'filter_data': True, 'use_trt': False, 'use_xla': False, 'loss': 'cross_entropy', 'epochs': 50, 'pretrained_weights_file': None, 'lr_scheduler': None, 'unet_model': <iva.unet.model.resnet_unet.ResnetUnet object at 0x7f9ea83c1358>, 'key': 'tlt_encode', 'experiment_spec': random_seed: 42
dataset_config {
dataset: "custom"
input_image_type: "grayscale"
train_images_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/ec350625-500b-43bd-879f-9fb592013485/images/train"
train_masks_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/ec350625-500b-43bd-879f-9fb592013485/masks/train"
val_images_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/0914ff02-fd5a-4cad-88ef-faed08165ce4/images/val"
val_masks_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/0914ff02-fd5a-4cad-88ef-faed08165ce4/masks/val"
data_class_config {
target_classes {
name: "foreground"
mapping_class: "foreground"
}
target_classes {
name: "background"
label_id: 1
mapping_class: "background"
}
}
augmentation_config {
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.5
crop_and_resize_prob: 0.5
}
brightness_augmentation {
delta: 0.20000000298023224
}
}
resize_padding: true
resize_method: "BILINEAR"
filter_data: true
}
model_config {
num_layers: 18
use_batch_norm: true
training_precision {
backend_floatx: FLOAT32
}
arch: "resnet"
all_projections: true
model_input_height: 352
model_input_width: 128
model_input_channels: 1
}
training_config {
batch_size: 3
regularizer {
type: L1
weight: 0.0029441313818097115
}
optimizer {
adam {
epsilon: 9.99999993922529e-09
beta1: 0.8526546955108643
beta2: 0.9990000128746033
}
}
checkpoint_interval: 1
log_summary_steps: 10
learning_rate: 0.00040598164196126163
loss: "cross_entropy"
epochs: 50
visualizer {
save_summary_steps: 1
}
data_options: true
}
, 'seed': 42, 'benchmark': False, 'temp_dir': '/tmp/tmp9y5k5vtf', 'num_classes': 2, 'num_conf_mat_classes': 2, 'start_step': 0, 'checkpoint_interval': 1, 'model_json': None, 'custom_objs': {}, 'load_graph': False, 'remove_head': False, 'buffer_size': None, 'data_options': True, 'weights_monitor': False, 'visualize': False, 'save_summary_steps': 1, 'infrequent_save_summary_steps': None, 'enable_qat': False, 'phase': 'val', 'model_size': 183.5315408706665}
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[54947,1],1]
Exit code: 1
--------------------------------------------------------------------------
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL