Error while training Efficientdet with efficientnet_b1

Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/efficientde$
File “/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py”, line 417, in load_model
f = h5dict(filepath, ‘r’)
File “/usr/local/lib/python3.6/dist-packages/keras/utils/io_utils.py”, line 186, in init
self.data = h5py.File(path, mode=mode)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 142, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 78, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.

Please check if the ngc API key is correct.

yeah. it is correct. i have downloaded efficientnet_b0 and efficientnet_b2 as well. they both work fine. i am getting this error for efficientnet_b1 only.

Please try to download efficientnet_b1 again. Not sure if it is broken.

i tried deleting and downloading 3-4 times as well. but the error persists everytime.

Can you share full training command and training spec file?

training_config {
   train_batch_size: 4
   iterations_per_loop: 10
   checkpoint_period: 1
   num_examples_per_epoch: 56897
   num_epochs: 20
   #model_name: 'efficientdet-d1'
   profile_skip_steps: 100
   tf_random_seed:  42
   lr_warmup_epoch: 5
   lr_warmup_init: 1e-05
   learning_rate: 0.001
   amp: True
   moving_average_decay: 0.9999
   l2_weight_decay: 0.0001
   l1_weight_decay: 0.0
   checkpoint: "/workspace/TAO/efficientdet_pretrained_models/efficientnet_b1.hdf5"
}
dataset_config {
   num_classes: 17
   image_size: "544,960"
   training_file_pattern: "/workspace/TAO/T_1/tfrecords/train-*"
   validation_file_pattern: "/workspace/TAO/T_1/tfrecords/val-*"
   validation_json_file: "/workspace/val/val_COCO.json"
   max_instances_per_image: 100
   skip_crowd_during_training: True
}
eval_config {
   eval_batch_size: 4
   eval_epoch_cycle: 1
   eval_after_training: True
   eval_samples: 20561
   min_score_thresh: 0.4
   max_detections_per_image: 100
}
model_config {
   model_name: "efficientdet-d1"
   min_level: 3
   max_level: 7
   num_scales: 3
   aspect_ratios : '[(1,1), (1.73,0.57), (0.57,1.73)]'
   anchor_scale : 4
}
augmentation_config {
   rand_hflip: True
   random_crop_min_scale: 0.1
   random_crop_min_scale: 2.0
}

tao efficientdet train --gpus 1 -e /workspace/TAO/T_1/experiment_spec.txt -d /workspace/TAO/T_1/weights -k key --log_file /workspace/TAO/T_1/log.txt

Please run below and share the result.

tao efficientdet run ls -rlt /workspace/TAO/efficientdet_pretrained_models/*

and

tao efficientdet run md5sum /workspace/TAO/efficientdet_pretrained_models/*

Please check your ~/.tao_mounts.json if it is correct.
In your spec, you set pretrained model path to “/workspace/TAO/efficientdet_pretrained_models/efficientnet_b1.hdf5” .

I just want to know the "ls -rlt " and “md5sum” for your models.

ls -rlt /workspace/TAO/efficientdet_pretrained_models/*
-rw------- 1 root root 0 Mar 2 05:07 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b0.hdf5
-rw------- 1 root root 64864720 Mar 7 04:33 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b2.hdf5
-rw------- 1 root root 55160336 Mar 7 13:25 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b1.hdf5
-rw------- 1 root root 89213464 Mar 7 15:43 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b3.hdf5

md5sum /workspace/TAOefficientdet_pretrained_models/*
d41d8cd98f00b204e9800998ecf8427e /workspace/TAO/efficientdet_pretrained_models/efficientnet_b0.hdf5
49e8b63a6a15c28666e06f028352ea89 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b1.hdf5
483b0fee8386f263807269dd2a0d3b86 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b2.hdf5
6e9eed904dde1a148f10deec358c8314 /workspace/TAO/efficientdet_pretrained_models/efficientnet_b3.hdf5

I cannot reproduce your error. The efficientnet_b1.hdf5 works fine.
Please double check on your side.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.