I’m trying to train the Peoplenet_v2 from nvidia but am running to the following error when executing:
!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k tlt_encode
-n resnet18_detector
–gpus $NUM_GPUS
!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt \
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
-k tlt_encode \
-n resnet18_detector \
--gpus $NUM_GPUS
Using TensorFlow backend.
2020-12-29 21:17:12.470165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.318526: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-29 21:17:15.330095: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.330443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2020-12-29 21:17:15.330469: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.330517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-29 21:17:15.331537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-29 21:17:15.331824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-29 21:17:15.333273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-29 21:17:15.334346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-29 21:17:15.334395: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-29 21:17:15.334498: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.335084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.335502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-29 21:17:15.335531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.876336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-29 21:17:15.876394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-29 21:17:15.876405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-29 21:17:15.876644: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6774 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-29 21:17:15,878 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/examples/detectnet_v2/specs/peoplenet_v2_train_resnet18_kitti.txt.
2020-12-29 21:17:15,880 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/examples/detectnet_v2/specs/peoplenet_v2_train_resnet18_kitti.txt
2020-12-29 21:17:16,498 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 6434 samples with a batch size of 16; each epoch will therefore take one extra step.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 544, 960) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 272, 480) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 272, 480) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 136, 240) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 136, 240) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 136, 240) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 136, 240) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 136, 240) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 136, 240) 36928 activation_3[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 64, 136, 240) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 136, 240) 36928 activation_4[0][0]
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 activation_3[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 136, 240) 256 block_1b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 136, 240) 0 block_1b_bn_2[0][0]
block_1b_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 64, 136, 240) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 68, 120) 73856 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 128, 68, 120) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 68, 120) 147584 activation_6[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 68, 120) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 128, 68, 120) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 68, 120) 147584 activation_7[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 128, 68, 120) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 68, 120) 147584 activation_8[0][0]
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 68, 120) 16512 activation_7[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 68, 120) 512 block_2b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 68, 120) 0 block_2b_bn_2[0][0]
block_2b_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 128, 68, 120) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 34, 60) 295168 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 256, 34, 60) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 34, 60) 590080 activation_10[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60) 33024 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 34, 60) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 256, 34, 60) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 34, 60) 590080 activation_11[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 256, 34, 60) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 34, 60) 590080 activation_12[0][0]
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 34, 60) 65792 activation_11[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 34, 60) 1024 block_3b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 34, 60) 0 block_3b_bn_2[0][0]
block_3b_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 256, 34, 60) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 34, 60) 1180160 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 512, 34, 60) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 activation_14[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60) 131584 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 34, 60) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 512, 34, 60) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 activation_15[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 512, 34, 60) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 activation_16[0][0]
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 34, 60) 262656 activation_15[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 34, 60) 2048 block_4b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 34, 60) 0 block_4b_bn_2[0][0]
block_4b_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 512, 34, 60) 0 add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 12, 34, 60) 6156 activation_17[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 3, 34, 60) 1539 activation_17[0][0]
==================================================================================================
Total params: 11,555,983
Trainable params: 11,544,335
Non-trainable params: 11,648
__________________________________________________________________________________________________
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-29 21:17:20,526 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-12-29 21:17:20,526 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 6434, number of sources: 1, batch size per gpu: 16, steps: 403
2020-12-29 21:17:20,655 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-29 21:17:20.696742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.697123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2020-12-29 21:17:20.697174: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:20.697225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-29 21:17:20.697260: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-29 21:17:20.697288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-29 21:17:20.697314: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-29 21:17:20.697340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-29 21:17:20.697362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-29 21:17:20.697439: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.697758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.698026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-29 21:17:20,931 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2020-12-29 21:17:20,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-29 21:17:20,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-29 21:17:21,619 [INFO] iva.detectnet_v2.scripts.train: Found 6434 samples in training set
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 1047, number of sources: 1, batch size per gpu: 16, steps: 66
2020-12-29 21:17:24,635 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-29 21:17:24,892 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2020-12-29 21:17:24,898 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-29 21:17:24,898 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-29 21:17:25,404 [INFO] iva.detectnet_v2.scripts.train: Found 1047 samples in validation set
Traceback (most recent call last):
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 121, in extract_checkpoint_file
File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
self._RealGetContents()
File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 55, in main
File "<decorator-gen-2>", line 2, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 773, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 691, in run_experiment
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 624, in train_gridbox
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 141, in run_training_loop
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 177, in get_latest_checkpoint
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 155, in get_tf_ckpt
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 126, in extract_checkpoint_file
ValueError: The zipfile extracted was corrupt. Please check your key.
My training config is as follow:
random_seed: 42 dataset_config { data_sources { tfrecords_path: '/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*' image_directory_path: '/workspace/tlt-experiments/data/training' } image_extension: 'jpg' target_class_mapping { key: 'person' value: 'Person' } target_class_mapping { key: 'Person' value: 'Person' } target_class_mapping { key: 'rider' value: 'Person' } target_class_mapping { key: 'Rider' value: 'Person' } target_class_mapping { key: 'personal_bag' value: 'Bag' } target_class_mapping { key: 'rolling_bag' value: 'Bag' } target_class_mapping { key: 'face' value: 'Face' } validation_fold: 0 } augmentation_config { preprocessing { output_image_width: 960 output_image_height: 544 crop_right: 960 crop_bottom: 544 min_bbox_width: 1.0 min_bbox_height: 1.0 output_image_channel: 3 } spatial_augmentation { hflip_probability: 0.5 zoom_min: 1.0 zoom_max: 1.0 translate_max_x: 8.0 translate_max_y: 8.0 } color_augmentation { hue_rotation_max: 25.0 saturation_shift_max: 0.20000000298 contrast_scale_max: 0.10000000149 contrast_center: 0.5 } } postprocessing_config { target_class_config { key: 'Person' value { clustering_config { coverage_threshold: 0.00499999988824 dbscan_eps: 0.20000000298 dbscan_min_samples: 0.0500000007451 minimum_bounding_box_height: 4 } } } target_class_config { key: 'Bag' value { clustering_config { coverage_threshold: 0.00499999988824 dbscan_eps: 0.15000000596 dbscan_min_samples: 0.0500000007451 minimum_bounding_box_height: 4 } } } target_class_config { key: 'Face' value { clustering_config { coverage_threshold: 0.00499999988824 dbscan_eps: 0.15000000596 dbscan_min_samples: 0.0500000007451 minimum_bounding_box_height: 4 } } } } model_config { pretrained_model_file: '/workspace/tlt-experiments/detectnet_v2/pretrained_peoplenet/tlt_peoplenet_vunpruned_v2.0/resnet18_peoplenet.tlt' num_layers: 18 load_graph: True use_batch_norm: False activation { activation_type: 'relu' } objective_set { bbox { scale: 35.0 offset: 0.5 } cov { } } training_precision { backend_floatx: FLOAT32 } arch: 'resnet' } evaluation_config { validation_period_during_training: 1 first_validation_epoch: 1 minimum_detection_ground_truth_overlap { key: 'Person' value: 0.699999988079 } minimum_detection_ground_truth_overlap { key: 'Bag' value: 0.5 } minimum_detection_ground_truth_overlap { key: 'Face' value: 0.5 } evaluation_box_config { key: 'Person' value { minimum_height: 20 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } evaluation_box_config { key: 'Bag' value { minimum_height: 20 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } evaluation_box_config { key: 'Face' value { minimum_height: 20 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } average_precision_mode: INTEGRATE } cost_function_config { target_classes { name: 'Person' class_weight: 1.0 coverage_foreground_weight: 0.0500000007451 objectives { name: 'cov' initial_weight: 1.0 weight_target: 1.0 } objectives { name: 'bbox' initial_weight: 10.0 weight_target: 10.0 } } target_classes { name: 'Bag' class_weight: 8.0 coverage_foreground_weight: 0.0500000007451 objectives { name: 'cov' initial_weight: 1.0 weight_target: 1.0 } objectives { name: 'bbox' initial_weight: 10.0 weight_target: 1.0 } } target_classes { name: 'Face' class_weight: 4.0 coverage_foreground_weight: 0.0500000007451 objectives { name: 'cov' initial_weight: 1.0 weight_target: 1.0 } objectives { name: 'bbox' initial_weight: 10.0 weight_target: 10.0 } } enable_autoweighting: true max_objective_weight: 0.999899983406 min_objective_weight: 9.99999974738e-05 } training_config { batch_size_per_gpu: 16 num_epochs: 10 learning_rate { soft_start_annealing_schedule { min_learning_rate: 10e-10 max_learning_rate: 10e-10 soft_start: 0.0 annealing: 0.3 } } regularizer { type: L1 weight: 3.00000002618e-09 } optimizer { adam { epsilon: 9.99999993923e-09 beta1: 0.899999976158 beta2: 0.999000012875 } } cost_scaling { initial_exponent: 20.0 increment: 0.005 decrement: 1.0 } checkpoint_interval: 10 } bbox_rasterizer_config { target_class_config { key: 'Person' value { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.40000000596 cov_radius_y: 0.40000000596 bbox_min_radius: 1.0 } } target_class_config { key: 'Bag' value { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 1.0 cov_radius_y: 1.0 bbox_min_radius: 1.0 } } target_class_config { key: 'Face' value { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 1.0 cov_radius_y: 1.0 bbox_min_radius: 1.0 } } deadzone_radius: 0.400000154972 }
If anyone could help, much would be appreciated