OSError: Unable to open file (file signature not found)

2022-01-13 17:38:14,533 [INFO] root: Registry: [‘nvcr.io’]
2022-01-13 17:38:14,706 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-13 17:38:15,746 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/chenhongzhao/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2022-01-13 09:38:18.074014: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-01-13 09:38:22,304 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 142, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 78, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

During handling of the above exception, another exception occurred:

fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py", line 153, in decode_to_keras
OSError: Invalid decryption. Unable to open file (file signature not found). The key used to load the model is incorrect.
Traceback (most recent call last):

#################################################################
Running this project: “…/cv_samples_v1.3.0/bpnet” .
All the previous steps are OK until training

this is my API key:OXE5aDZnZmZpYmIwYW90OW12dTVlNXZzZTY6ZDgyYjZmYjItN2U3Yy00ODE5LWFhMmItY2YxZjUyYjM2NWVk
I try to use “tlt_encode”,no effect.

I have set.
docker login nvcr.io
Username: $oauthtoken
Password: OXE5aDZnZmZpYmIwYW90OW12dTVlNXZzZTY6ZDgyYjZmYjItN2U3Yy00ODE5LWFhMmItY2YxZjUyYjM2NWVk

please help me.

May I know which network did you run? Can you share the full commandline?

!ngc registry model download-version nvidia/tao/bodyposenet:trainable_v1.0
–dest $LOCAL_EXPERIMENT_DIR/pretrained_model

Out:
Downloaded 64.33 MB in 16s, Download speed: 4.02 MB/s

Transfer id: bodyposenet_vtrainable_v1.0 Download status: Completed.
Downloaded local path: …/cv_samples_v1.3.0/bpnet/pretrained_model/bodyposenet_vtrainable_v1-1.0
Total files downloaded: 1
Total downloaded size: 64.33 MB
Started at: 2022-01-13 17:36:15.091512
Completed at: 2022-01-13 17:36:31.115565
Duration taken: 16s

Can you share the full commandline when you run the training?

!tao bpnet train -e $SPECS_DIR/bpnet_train_m1_coco.yaml \
-r $USER_EXPERIMENT_DIR/models/exp_m1_unpruned \
-k $KEY \
–gpus $NUM_GPUS

Please change
-k $KEY

to
-k nvidia_tlt

and retry.

1 Like

OK, thank you for your help. But why can it succeed.

Because you are using the pretrained bpnet model in ngc, and you run with it as pretrained model. Its loading key is nvidia_tlt.
Similar to gazenet model card. Gaze Estimation | NVIDIA NGC
But the bpnet model card does not mention. It should be the same.

The trainable and deployable models are encrypted and will only operate with the following key:

  • Model load key: nvidia_tlt

I see.Thank you very much.