Can't export retrained model

• Network Type: Efficientdet-tf1

I can export original train model.
After pruning, the pruned model is retrained.
The retrained model can’t be exported.

Did my retraining process has issue?
In this session, retraining need to turn off regularizer type to NO_REG.
Can’t find in config file to set.

Retrainng config part is as follows.

training_config {
  checkpoint: "/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/pruned/model.tlt"
  pruned_model_path: "/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/pruned/model.tlt"
  train_batch_size: 16
  iterations_per_loop: 10
  checkpoint_period: 2
  num_examples_per_epoch: 348
  num_epochs: 50
  tf_random_seed: 42
  lr_warmup_epoch: 3
  lr_warmup_init: 0.002
  learning_rate: 0.02
  amp: True
  moving_average_decay: 0.9999
  l2_weight_decay: 0.00001
  l1_weight_decay: 0.0
}

The whole errors are as follows.

root@f054306bf6b5:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1# rm -rf results/retrain/export/*
root@f054306bf6b5:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1# efficientdet_tf1 export -m /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/model.epoch-49.tlt                          -o /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/export/model.etlt                         -e /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt                         -k nvidia_tao
2023-10-27 05:29:16.376833: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-27 05:29:16,440 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-10-27 05:29:17.450068: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
Using TensorFlow backend.
2023-10-27 05:29:17,842 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-27 05:29:17,889 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-27 05:29:17,893 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-27 05:29:19.191514: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libnvinfer.so.8
2023-10-27 05:29:19.215000: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
Loading experiment spec at %s. /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt
INFO:nvidia_tao_tf1.cv.efficientdet.utils.spec_loader:Merging specification from /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt
INFO:root:Starting EfficientDet export.
INFO:root:Loading weights from /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/model.epoch-49.tlt
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py:155: The name tf.enable_resource_variables is deprecated. Please use tf.compat.v1.enable_resource_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py:155: The name tf.enable_resource_variables is deprecated. Please use tf.compat.v1.enable_resource_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:555: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:555: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:559: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:559: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:570: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:570: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
INFO:nvidia_tao_tf1.cv.efficientdet.models.anchors:Using tf version of post-processing.
INFO:nvidia_tao_tf1.cv.efficientdet.models.anchors:Using native nms.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:198: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:198: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:210: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:210: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:212: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:212: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

INFO:tensorflow:Restoring parameters from /tmp/tmpk_6f91g4/model.ckpt-1087
INFO:tensorflow:Restoring parameters from /tmp/tmpk_6f91g4/model.ckpt-1087
INFO:root:Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
    saver = tf.train.Saver(var_dict, max_to_keep=1)
  File "/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow_core/python/training/saver.py", line 868, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
    _, _, _op = _op_def_lib._apply_op_helper(
  File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[{{node save/RestoreV2}}]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[{{node save/RestoreV2}}]]
         [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1289, in restore
    sess.run(self.saver_def.restore_op_name,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
    saver = tf.train.Saver(var_dict, max_to_keep=1)
  File "/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow_core/python/training/saver.py", line 868, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
    _, _, _op = _op_def_lib._apply_op_helper(
  File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1300, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 312, in <module>
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 218, in restore_ckpt
    saver.restore(sess, ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1305, in restore
    raise _wrap_restore_error_with_msg(
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
    saver = tf.train.Saver(var_dict, max_to_keep=1)
  File "/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow_core/python/training/saver.py", line 868, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
    _, _, _op = _op_def_lib._apply_op_helper(
  File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Execution status: FAIL
root@f054306bf6b5:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1#

Could you run Evaluate retrained model section successfully? If yes, the retraining process has no issue.

Please try to generate a new folder which contains model.epoch-49.tlt only. Then retry.

Still have error, even only one file is left inside the folder.

The log messages are as follows.

root@cadf0713c5f7:/workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain# efficientdet_tf1 export -m /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/model.epoch-49.tlt                          -o /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/export/model.onnx                         -e /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt                         -k nvidia_tao
2023-10-30 03:39:17.535090: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2023-10-30 03:39:17,599 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-10-30 03:39:18.596334: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
Using TensorFlow backend.
2023-10-30 03:39:18,991 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-30 03:39:19,039 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-30 03:39:19,043 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2023-10-30 03:39:20.355531: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libnvinfer.so.8
2023-10-30 03:39:20.379170: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
Loading experiment spec at %s. /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt
INFO:nvidia_tao_tf1.cv.efficientdet.utils.spec_loader:Merging specification from /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/specs/efficientdet_d0_train.txt
INFO:nvidia_tao_tf1.cv.common.logging.logging:Log file already exists at /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/export/status.json
INFO:root:Starting EfficientDet export.
INFO:root:Loading weights from /workspace/Nyan/tao_source_codes_v5.0.0/notebooks/tao_launcher_starter_kit/efficientdet_tf1/results/retrain/model.epoch-49.tlt
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py:155: The name tf.enable_resource_variables is deprecated. Please use tf.compat.v1.enable_resource_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py:155: The name tf.enable_resource_variables is deprecated. Please use tf.compat.v1.enable_resource_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:555: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:555: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:559: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:559: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:570: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:570: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
INFO:nvidia_tao_tf1.cv.efficientdet.models.anchors:Using tf version of post-processing.
INFO:nvidia_tao_tf1.cv.efficientdet.models.anchors:Using native nms.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:198: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:198: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:210: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:210: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:212: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py:212: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

INFO:tensorflow:Restoring parameters from /tmp/tmpv70oae43/model.ckpt-1087
INFO:tensorflow:Restoring parameters from /tmp/tmpv70oae43/model.ckpt-1087
INFO:root:Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_1589]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
    saver = tf.train.Saver(var_dict, max_to_keep=1)
  File "/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow_core/python/training/saver.py", line 868, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
    _, _, _op = _op_def_lib._apply_op_helper(
  File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[{{node save/RestoreV2}}]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[{{node save/RestoreV2}}]]
         [[save/RestoreV2/_1589]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1289, in restore
    sess.run(self.saver_def.restore_op_name,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_1589]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
    saver = tf.train.Saver(var_dict, max_to_keep=1)
  File "/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow_core/python/training/saver.py", line 868, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
    _, _, _op = _op_def_lib._apply_op_helper(
  File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1300, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 312, in <module>
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 218, in restore_ckpt
    saver.restore(sess, ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 1305, in restore
    raise _wrap_restore_error_with_msg(
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Key top_bn/beta/ExponentialMovingAverage not found in checkpoint
         [[node save/RestoreV2 (defined at /tensorflow_core/python/framework/ops.py:1748) ]]
         [[save/RestoreV2/_1589]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 300, in <module>
    main()
  File "/nvidia_tao_tf1/cv/efficientdet/scripts/export.py", line 163, in main
    driver.build()
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 582, in build
    restore_ckpt(
  File "/nvidia_tao_tf1/cv/efficientdet/inferencer/inference.py", line 212, in restore_ckpt
    saver = tf.train.Saver(var_dict, max_to_keep=1)
  File "/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow_core/python/training/saver.py", line 868, in _build
    self.saver_def = self._builder._build_internal(  # pylint: disable=protected-access
  File "/tensorflow_core/python/training/saver.py", line 507, in _build_internal
    restore_op = self._AddRestoreOps(filename_tensor, saveables,
  File "/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
    all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
  File "/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
    _, _, _op = _op_def_lib._apply_op_helper(
  File "/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Execution status: FAIL

Thanks for the info. I will check further.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

I cannot reproduce the export error against retrained model. To narrow down, suggest you to follow notebook to train 1 epoch then pruned, retrain it for 1 epoch and retry. Also, suggest you run evaluation against the retrained model to check if it is expected.

Evaluation on retrain model has no issue. Only export has issue. Thanks let me try that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.