OSError: Unable to open file (file signature not found)

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ipxhrz3z because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:67: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:67: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-05-19 08:51:31,406 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-05-19 08:51:31,406 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-05-19 08:51:31,757 [INFO] main: Loading experiment spec at /workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/training_tlt/specs/egohands_train_resnet34_kitti.txt.
2021-05-19 08:51:31,758 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/training_tlt/specs/egohands_train_resnet34_kitti.txt
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2021-05-19 08:51:31,994 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2021-05-19 08:51:31,994 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-05-19 08:51:31,996 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-05-19 08:51:32,019 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-05-19 08:51:32,020 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-05-19 08:51:32,037 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py”, line 260, in decode_to_keras
File “/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py”, line 417, in load_model
f = h5dict(filepath, ‘r’)
File “/usr/local/lib/python3.6/dist-packages/keras/utils/io_utils.py”, line 186, in init
self.data = h5py.File(path, mode=mode)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 142, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 78, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 797, in
File “”, line 2, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 790, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 691, in run_experiment
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 576, in train_gridbox
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 387, in build_gridbox_model
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 112, in construct_model
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py”, line 101, in model_io
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py”, line 263, in decode_to_keras
OSError: Invalid decryption. Unable to open file (file signature not found). The key used to load the model is incorrect.

Traceback (most recent call last):
File “/usr/local/bin/detectnet_v2”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
2021-05-19 16:51:34,723 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This error is occurred when I run the command:
tlt detectnet_v2 train -e $SPECS_DIR/egohands_train_resnet34_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet
-k $KEY
-n resnet34_detector
–gpus $NUM_GPUS

What should I do to solve it?

Please check if the $KEY is correct.

I think the key value should be OK, because I have successfully run the command:
!tlt detectnet_v2 train -e $SPECS_DIR/egohands_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-n resnet18_detector
–gpus $NUM_GPUS


Matching predictions to ground truth, class 1/1.: 100%|█| 2626/2626 [00:00<00:00, 21335.09it/s]
Epoch 120/120

Validation cost: 0.000159
Mean average_precision (in %): 91.4582

class name average precision (in %)


hand 91.4582

Suggest you double check the command you ran into error.
What you have run is as below. Please double check the $SPECS_DIR , $KEY are correct.

tlt detectnet_v2 train -e $SPECS_DIR/egohands_train_resnet34_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet
-k $KEY
-n resnet34_detector
–gpus $NUM_GPUS

When I run the command:
!tlt detectnet_v2 evaluate -e $SPECS_DIR/egohands_train_resnet18_kitti.txt
-m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt
-k $KEY
It’s successful.That means that the key value is useful.

But when I run the command:
!tlt detectnet_v2 train -e /workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/training_tlt/specs/egohands_train_resnet34_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet
-k cHFhMDAxYXJyN2FzaTg5ZjVhc21oY29mNjE6MGNkNzM4MmItYzYzZC00OGIyLThhMjUtZDE3Y2IyMWU5NjYz
-n resnet34_detector
–gpus $NUM_GPUS

The same error.

Can you share your /workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/training_tlt/specs/egohands_train_resnet34_kitti.txt ?

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “hand”
value: “hand”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1280
output_image_height: 720
min_bbox_width: 10.0
min_bbox_height: 20.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.3
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “hand”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}

model_config {
num_layers: 34
pretrained_model_file: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/egohands/pretrained_peoplenet/tlt_peoplenet_vunpruned_v2.1/resnet34_peoplenet.tlt”
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
all_projections: true
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 30
minimum_detection_ground_truth_overlap {
key: “hand”
value: 0.699999988079
}
evaluation_box_config {
key: “hand”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “hand”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 4
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “hand”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

Is tlt_peoplenet_vunpruned_v2.1/resnet34_peoplenet.tlt a public model in ngc?

I download this model by executing the command:
!ngc registry model download-version nvidia/tlt_peoplenet:unpruned_v2.1
–dest $LOCAL_EXPERIMENT_DIR/pretrained_peoplenet

So, the key should be tlt_encode .
You can find this info in the model card.
See PeopleNet — Transfer Learning Toolkit 3.0 documentation

But I download the detectnet_v2 model by executing the command:
!ngc registry model download-version nvidia/tlt_pretrained_detectnet_v2:resnet18
–dest $LOCAL_EXPERIMENT_DIR/pretrained_detectnet_v2

Then execute the command:
!tlt detectnet_v2 train -e $SPECS_DIR/egohands_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-n resnet18_detector
–gpus $NUM_GPUS
It works well.

And I modify the %env KEY=cHFhMDAxYXJyN2FzaTg5ZjVhc21oY29mNjE6MGNkNzM4MmItYzYzZC00OGIyLThhMjUtZDE3Y2IyMWU5NjYz ,
I guass it should be same as KEY= tlt_encode

Can you share $SPECS_DIR/egohands_train_resnet18_kitti.txt ?

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “hand”
value: “hand”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1280
output_image_height: 720
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “hand”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 10
}
}
}
}

model_config {
pretrained_model_file: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/egohands/pretrained_detectnet_v2/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}

evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 30
minimum_detection_ground_truth_overlap {
key: “hand”
value: 0.699999988079
}
evaluation_box_config {
key: “hand”
value {
minimum_height: 10
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}

cost_function_config {
target_classes {
name: “hand”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}

training_config {
batch_size_per_gpu: 4
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “hand”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

This pretrained_model_file: “/workspace/tlt3.0_experiments/gesture_recognition_tlt_deepstream-main/egohands/pretrained_detectnet_v2/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5” is not a tlt format model. So, you do not need to set a specific key for it. You can just use your own key.

My test is following the document:

And when I run the command:
!ngc registry model download-version nvidia/tlt_peoplenet:unpruned_v2.1
–dest $LOCAL_EXPERIMENT_DIR/pretrained_peoplenet

the resnet34_peoplenet.tlt will be download.

Again, for resnet34_peoplenet.tlt , it is a purpose-built tlt model which is trained by Nvidia, and its key is mentioned as tlt_encode.
End user should use tlt_encode when load it during training.

It works.But how do I update ngc_key in tlt?

What do you mean by “update ngc_key in tlt”?

When I get a new API key from NGC, how do I update the API key?

Do you mean you already train a tlt model, and also you get a new API key , how to update the new key to the tlt model?