When prune is executed, "OSError: Invalid decryption. Unable to open file (file signature not found). " occurs

An error occurs when I try to prune a model that has been transfer learned with maskrcnn.

  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 247, in decode_to_keras
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 417, in load_model
    f = h5dict(filepath, 'r')
  File "/usr/local/lib/python3.6/dist-packages/keras/utils/io_utils.py", line 186, in __init__
    self.data = h5py.File(path, mode=mode)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-prune", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 178, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 113, in run_pruning
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 250, in decode_to_keras
OSError: Invalid decryption. Unable to open file (file signature not found). The key used to load the model is incorrect.

I am using tlt2.0.
APIKEY is specified as the key for transfer learning and prune.
I try to use “tlt_encode” and “nvidia_tlt”, no effect.

Here are the commands used
tlt-prune -m $moder_file -o $out_dir -k $APIKEY

Here is the spec file when we did transfer learning.

seed: 123
use_amp: False
warmup_steps: 1000
checkpoint: "/workspace/tlt-experiments/maskrcnn/pretrained/resnet50.hdf5"
learning_rate_steps: "[10000, 15000, 20000]"
learning_rate_decay_levels: "[0.1, 0.02, 0.01]"
total_steps: 25000
train_batch_size: 1
eval_batch_size: 1
num_steps_per_eval: 5000
momentum: 0.9
l2_weight_decay: 0.0001
warmup_learning_rate: 0.0001
init_learning_rate: 0.01

data_config{
    image_size: "(640, 448)"
    augment_input_data: True
    eval_samples: 500
    training_file_pattern: "/workspace/tlt-experiments/datasets/coco_tfrecords/train*.tfrecord"
    validation_file_pattern: "/workspace/tlt-experiments/datasets/coco_tfrecords/val*.tfrecord"
    val_json_file: "/workspace/tlt-experiments/coco/annotations/instances_val2017.json"

    # dataset specific parameters
    num_classes: 91
    skip_crowd_during_training: True
}

maskrcnn_config {
    nlayers: 50
    arch: "resnet"
    freeze_bn: True
    freeze_blocks: "[0,1]"
    gt_mask_size: 112

    # Region Proposal Network
    rpn_positive_overlap: 0.7
    rpn_negative_overlap: 0.3
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_min_size: 0.

    # Proposal layer.
    batch_size_per_im: 512
    fg_fraction: 0.25
    fg_thresh: 0.5
    bg_thresh_hi: 0.5
    bg_thresh_lo: 0.

    # Faster-RCNN heads.
    fast_rcnn_mlp_head_dim: 1024
    bbox_reg_weights: "(10., 10., 5., 5.)"

    # Mask-RCNN heads.
    include_mask: True
    mrcnn_resolution: 28

    # training
    train_rpn_pre_nms_topn: 2000
    train_rpn_post_nms_topn: 1000
    train_rpn_nms_threshold: 0.7

    # evaluation
    test_detections_per_image: 100
    test_nms: 0.5
    test_rpn_pre_nms_topn: 1000
    test_rpn_post_nms_topn: 1000
    test_rpn_nms_thresh: 0.7

    # model architecture
    min_level: 2
    max_level: 6
    num_scales: 1
    aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
    anchor_scale: 8

    # localization loss
    rpn_box_loss_weight: 1.0
    fast_rcnn_box_loss_weight: 1.0
    mrcnn_weight_loss_mask: 1.0
}

please help me.

Which key did you use when you run tlt-train ?

I used APIKEY

So, you should use the same key when run tlt-prune.

I use the same API key for tlt-prune.

You can try a small experiment.
Use 123 as the key. Then run tlt-train for 1 epoch.
Then run tlt-prune with this key 123.

Check if issue is gone.

I tried train and prune with 123 key.
But I get the same error.

Can you change above to your pruned model and retry?

checkpoint: "/workspace/tlt-experiments/maskrcnn/pretrained/resnet50.hdf5"
This is specified at the time of the train.
Is the prune model a transition learned model?
Also, why change to a prune model?

Ignore my previous comment.
For pruned model, please run the tao mask_rcnn train command with an updated spec file that points to the newly pruned model by setting pruned_model_path .

You can find the example retraining spec in the Jupyter notebook.

seed: 123
use_amp: False
warmup_steps: 1000
learning_rate_steps: “[10000, 15000, 20000]”
learning_rate_decay_levels: “[0.1, 0.02, 0.01]”
total_steps: 25000
train_batch_size: 2
eval_batch_size: 4
num_steps_per_eval: 5000
momentum: 0.9
l2_weight_decay: 0.0001
warmup_learning_rate: 0.0001
init_learning_rate: 0.01
pruned_model_path: “/workspace/tao-experiments/mask_rcnn/experiment_dir_pruned/model.tlt”

Is this a retraining method?
I am getting an error when I run prune.
(I am unable to create a prune model.)

Yes, it is for retraining.

Please use explicit key value and train for 1 epoch again.
For tlt-prune, to narrow down, can you use explicit value in the command line?

tlt-prune -m /workspace/tlt-experiments/maskrcnn/results/model.step-1000.tlt -o /workspace/tlt-experiments/maskrcnn/prune_out/prune_model.tlt -k 123
I am using this command to run tlt-prune.
The model “model.step-1000.tlt” was also trained with the key “123”.

To narrow down, can you try an official mask_rcnn model and try to prune?

wget https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplesegnet/versions/trainable_v2.1/files/peoplesegnet_resnet50.step-20000.tlt

Then try to run tlt-prune.

The key is nvidia_tlt

See PeopleSegNet | NVIDIA NGC

I have run tlt-prune using the official mask_rcnn model and am getting the same error.

Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 247, in decode_to_keras
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 417, in load_model
    f = h5dict(filepath, 'r')
  File "/usr/local/lib/python3.6/dist-packages/keras/utils/io_utils.py", line 186, in __init__
    self.data = h5py.File(path, mode=mode)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-prune", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 178, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 113, in run_pruning
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 250, in decode_to_keras
OSError: Invalid decryption. Unable to open file (file signature not found). The key used to load the model is incorrect.

I ran prune using this command.
tlt-prune -m /workspace/tlt-experiments/maskrcnn/peoplesegnet_resnet50.step-20000.tlt -o /workspace/tlt-experiments/maskrcnn/prune_out/prune_model.tlt -k nvidia_tlt

According to TLT2.0 user guide, Pruning the Model — Transfer Learning Toolkit 2.0 documentation,
can you try to use “-pm” instead of “-m”?

BTW, according to TLT2.0 user guide,

Currently tlt-prune doesn’t support MaskRCNN models.

So, suggest you to use TLT3.0 or latest TAO.

1 Like

Thank you so much.
I had overlooked the documentation.
If want to prune with maskrcnn, I will do use tlt3.0 or tao.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.