• Network Type: Efficientdet-tf1
I can export original train model.
After pruning, the pruned model is retrained.
The retrained model can’t be exported.
Did my retraining process has issue?
In this session, retraining need to turn off regularizer type to NO_REG
.
Can’t find in config file to set.
Retrainng config part is as follows.
training_config {
checkpoint: "/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/pruned/model.tlt"
pruned_model_path: "/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/pruned/model.tlt"
train_batch_size: 16
iterations_per_loop: 10
checkpoint_period: 2
num_examples_per_epoch: 348
num_epochs: 50
tf_random_seed: 42
lr_warmup_epoch: 3
lr_warmup_init: 0.002
learning_rate: 0.02
amp: True
moving_average_decay: 0.9999
l2_weight_decay: 0.00001
l1_weight_decay: 0.0
}
The whole errors are as follows.
root@f054306bf6b5:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1# rm -rf results/retrain/export/*
root@f054306bf6b5:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1# efficientdet_tf1 export -m /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/model.epoch-49.tlt -o /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/export/model.etlt -e /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt -k nvidia_tao
2023-10-27 05:29:16.376833: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-27 05:29:16,440 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-10-27 05:29:17.450068: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
Using TensorFlow backend.
2023-10-27 05:29:17,842 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2023-10-27 05:29:17,889 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2023-10-27 05:29:17,893 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2023-10-27 05:29:19.191514: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libnvinfer.so.8
2023-10-27 05:29:19.215000: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
Loading experiment spec at %s. /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt
INFO:nvidia_tao_tf1.cv.efficientdet.utils.spec_loader:Merging specification from /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt
INFO:root:Starting EfficientDet export.
INFO:root:Loading weights from /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/model.epoch-49.tlt
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py:155: The name tf.enable_resource_variables is deprecated. Please use tf.compat.v1.enable_resource_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py:155: The name tf.enable_resource_variables is deprecated. Please use tf.compat.v1.enable_resource_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:555: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:555: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:559: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:559: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:570: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:570: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
INFO:nvidia_tao_tf1.cv.efficientdet.models.anchors:Using tf version of post-processing.
INFO:nvidia_tao_tf1.cv.efficientdet.models.anchors:Using native nms.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:198: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:198: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:210: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:210: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:212: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:212: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
INFO:tensorflow:Restoring parameters from /tmp/tmpk_6f91g4/model.ckpt-1087
INFO:tensorflow:Restoring parameters from /tmp/tmpk_6f91g4/model.ckpt-1087
INFO:root:Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
main()
File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
driver.build()
File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
restore_ckpt(
File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
saver = tf.train.Saver(var_dict, max_to_keep=1)
File "/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[{{node save/RestoreV2}}]]
(1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1289, in restore
sess.run(self.saver_def.restore_op_name,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
main()
File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
driver.build()
File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
restore_ckpt(
File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
saver = tf.train.Saver(var_dict, max_to_keep=1)
File "/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1300, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 312, in <module>
raise e
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
driver.build()
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
restore_ckpt(
File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 218, in restore_ckpt
saver.restore(sess, ckpt_path)
File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1305, in restore
raise _wrap_restore_error_with_msg(
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
(1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
[[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
main()
File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
driver.build()
File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
restore_ckpt(
File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
saver = tf.train.Saver(var_dict, max_to_keep=1)
File "/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Execution status: FAIL
root@f054306bf6b5:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1#