Problem with training peoplenetv2

I’m trying to train the Peoplenet_v2 from nvidia but am running to the following error when executing:

!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k tlt_encode
-n resnet18_detector
–gpus $NUM_GPUS

!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt \

                    -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \

                    -k tlt_encode \

                    -n resnet18_detector \

                    --gpus $NUM_GPUS

Using TensorFlow backend.
2020-12-29 21:17:12.470165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.318526: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-29 21:17:15.330095: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.330443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2020-12-29 21:17:15.330469: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.330517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-29 21:17:15.331537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-29 21:17:15.331824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-29 21:17:15.333273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-29 21:17:15.334346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-29 21:17:15.334395: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-29 21:17:15.334498: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.335084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.335502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-29 21:17:15.335531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.876336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-29 21:17:15.876394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-12-29 21:17:15.876405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-12-29 21:17:15.876644: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6774 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-29 21:17:15,878 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/examples/detectnet_v2/specs/peoplenet_v2_train_resnet18_kitti.txt.
2020-12-29 21:17:15,880 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/examples/detectnet_v2/specs/peoplenet_v2_train_resnet18_kitti.txt
2020-12-29 21:17:16,498 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 6434 samples with a batch size of 16; each epoch will therefore take one extra step.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 3, 544, 960)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 272, 480) 9472        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 272, 480) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 272, 480) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 64, 136, 240) 0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       activation_2[0][0]               
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 136, 240) 0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 64, 136, 240) 0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       activation_3[0][0]               
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 64, 136, 240) 0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       activation_4[0][0]               
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        activation_3[0][0]               
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 136, 240) 0           block_1b_bn_2[0][0]              
                                                                 block_1b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 64, 136, 240) 0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 68, 120) 73856       activation_5[0][0]               
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 128, 68, 120) 0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      activation_6[0][0]               
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320        activation_5[0][0]               
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 68, 120) 0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 128, 68, 120) 0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 68, 120) 147584      activation_7[0][0]               
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 128, 68, 120) 0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      activation_8[0][0]               
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 68, 120) 16512       activation_7[0][0]               
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 68, 120) 0           block_2b_bn_2[0][0]              
                                                                 block_2b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 128, 68, 120) 0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 34, 60)  295168      activation_9[0][0]               
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 256, 34, 60)  0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      activation_10[0][0]              
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60)  33024       activation_9[0][0]               
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 34, 60)  0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 256, 34, 60)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 34, 60)  590080      activation_11[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 256, 34, 60)  0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      activation_12[0][0]              
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 34, 60)  65792       activation_11[0][0]              
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 34, 60)  0           block_3b_bn_2[0][0]              
                                                                 block_3b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 256, 34, 60)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 34, 60)  1180160     activation_13[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 512, 34, 60)  0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     activation_14[0][0]              
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60)  131584      activation_13[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 34, 60)  0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 512, 34, 60)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 34, 60)  2359808     activation_15[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 512, 34, 60)  0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     activation_16[0][0]              
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 34, 60)  262656      activation_15[0][0]              
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 34, 60)  0           block_4b_bn_2[0][0]              
                                                                 block_4b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 512, 34, 60)  0           add_8[0][0]                      
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 12, 34, 60)   6156        activation_17[0][0]              
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 3, 34, 60)    1539        activation_17[0][0]              
==================================================================================================
Total params: 11,555,983
Trainable params: 11,544,335
Non-trainable params: 11,648
__________________________________________________________________________________________________
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-29 21:17:20,526 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-12-29 21:17:20,526 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 6434, number of sources: 1, batch size per gpu: 16, steps: 403

2020-12-29 21:17:20,655 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-29 21:17:20.696742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.697123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2020-12-29 21:17:20.697174: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:20.697225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-29 21:17:20.697260: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-29 21:17:20.697288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-29 21:17:20.697314: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-29 21:17:20.697340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-29 21:17:20.697362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-29 21:17:20.697439: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.697758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.698026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-29 21:17:20,931 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2020-12-29 21:17:20,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-29 21:17:20,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-29 21:17:21,619 [INFO] iva.detectnet_v2.scripts.train: Found 6434 samples in training set
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 1047, number of sources: 1, batch size per gpu: 16, steps: 66
2020-12-29 21:17:24,635 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-29 21:17:24,892 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2020-12-29 21:17:24,898 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-29 21:17:24,898 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-29 21:17:25,404 [INFO] iva.detectnet_v2.scripts.train: Found 1047 samples in validation set
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 121, in extract_checkpoint_file
  File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 55, in main
  File "<decorator-gen-2>", line 2, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 773, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 691, in run_experiment
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 624, in train_gridbox
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 141, in run_training_loop
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 177, in get_latest_checkpoint
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 155, in get_tf_ckpt
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 126, in extract_checkpoint_file
ValueError: The zipfile extracted was corrupt. Please check your key.

My training config is as follow:

random_seed: 42
dataset_config {
data_sources {
    tfrecords_path: '/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*'
    image_directory_path: '/workspace/tlt-experiments/data/training'
}
image_extension: 'jpg'
target_class_mapping {
    key: 'person'
    value: 'Person'
}
    target_class_mapping {
        key: 'Person'
        value: 'Person'
    }
    target_class_mapping {
        key: 'rider'
        value: 'Person'
    }
    target_class_mapping {
        key: 'Rider'
        value: 'Person'
    }
    target_class_mapping {
        key: 'personal_bag'
        value: 'Bag'
    }
    target_class_mapping {
        key: 'rolling_bag'
        value: 'Bag'
    }
    target_class_mapping {
        key: 'face'
        value: 'Face'
    }

    validation_fold: 0
    }
    augmentation_config {
    preprocessing {
    output_image_width: 960
    output_image_height: 544
    crop_right: 960
    crop_bottom: 544
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    }
    spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
    }
    color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
    }
    }
    postprocessing_config {
    target_class_config {
    key: 'Person'
    value {
    clustering_config {
    coverage_threshold: 0.00499999988824
    dbscan_eps: 0.20000000298
    dbscan_min_samples: 0.0500000007451
    minimum_bounding_box_height: 4
    }
    }
    }
    target_class_config {
    key: 'Bag'
    value {
    clustering_config {
    coverage_threshold: 0.00499999988824
    dbscan_eps: 0.15000000596
    dbscan_min_samples: 0.0500000007451
    minimum_bounding_box_height: 4
    }
    }
    }
    target_class_config {
    key: 'Face'
    value {
    clustering_config {
    coverage_threshold: 0.00499999988824
    dbscan_eps: 0.15000000596
    dbscan_min_samples: 0.0500000007451
    minimum_bounding_box_height: 4
    }
    }
    }
    }
    model_config {
    pretrained_model_file: '/workspace/tlt-experiments/detectnet_v2/pretrained_peoplenet/tlt_peoplenet_vunpruned_v2.0/resnet18_peoplenet.tlt'
    num_layers: 18
    load_graph: True
    use_batch_norm: False
    activation {
    activation_type: 'relu'
    }
    objective_set {
    bbox {
    scale: 35.0
    offset: 0.5
    }
    cov {
    }
    }
    training_precision {
    backend_floatx: FLOAT32
    }
    arch: 'resnet'
    }
    evaluation_config {
    validation_period_during_training: 1
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
    key: 'Person'
    value: 0.699999988079
    }
    minimum_detection_ground_truth_overlap {
    key: 'Bag'
    value: 0.5
    }
    minimum_detection_ground_truth_overlap {
    key: 'Face'
    value: 0.5
    }
    evaluation_box_config {
    key: 'Person'
    value {
    minimum_height: 20
    maximum_height: 9999
    minimum_width: 10
    maximum_width: 9999
    }
    }
    evaluation_box_config {
    key: 'Bag'
    value {
    minimum_height: 20
    maximum_height: 9999
    minimum_width: 10
    maximum_width: 9999
    }
    }
    evaluation_box_config {
    key: 'Face'
    value {
    minimum_height: 20
    maximum_height: 9999
    minimum_width: 10
    maximum_width: 9999
    }
    }
    average_precision_mode: INTEGRATE
    }
    cost_function_config {
    target_classes {
    name: 'Person'
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
    name: 'cov'
    initial_weight: 1.0
    weight_target: 1.0
    }
    objectives {
    name: 'bbox'
    initial_weight: 10.0
    weight_target: 10.0
    }
    }
    target_classes {
    name: 'Bag'
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
    name: 'cov'
    initial_weight: 1.0
    weight_target: 1.0
    }
    objectives {
    name: 'bbox'
    initial_weight: 10.0
    weight_target: 1.0
    }
    }
    target_classes {
    name: 'Face'
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
    name: 'cov'
    initial_weight: 1.0
    weight_target: 1.0
    }
    objectives {
    name: 'bbox'
    initial_weight: 10.0
    weight_target: 10.0
    }
    }
    enable_autoweighting: true
    max_objective_weight: 0.999899983406
    min_objective_weight: 9.99999974738e-05
    }
    training_config {
    batch_size_per_gpu: 16
    num_epochs: 10
    learning_rate {
    soft_start_annealing_schedule {
    min_learning_rate: 10e-10
    max_learning_rate: 10e-10
    soft_start: 0.0
    annealing: 0.3
    }
    }
    regularizer {
    type: L1
    weight: 3.00000002618e-09
    }
    optimizer {
    adam {
    epsilon: 9.99999993923e-09
    beta1: 0.899999976158
    beta2: 0.999000012875
    }
    }
    cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
    }
    checkpoint_interval: 10
    }
    bbox_rasterizer_config {
    target_class_config {
    key: 'Person'
    value {
    cov_center_x: 0.5
    cov_center_y: 0.5
    cov_radius_x: 0.40000000596
    cov_radius_y: 0.40000000596
    bbox_min_radius: 1.0
    }
    }
    target_class_config {
    key: 'Bag'
    value {
    cov_center_x: 0.5
    cov_center_y: 0.5
    cov_radius_x: 1.0
    cov_radius_y: 1.0
    bbox_min_radius: 1.0
    }
    }
    target_class_config {
    key: 'Face'
    value {
    cov_center_x: 0.5
    cov_center_y: 0.5
    cov_radius_x: 1.0
    cov_radius_y: 1.0
    bbox_min_radius: 1.0
    }
    }
    deadzone_radius: 0.400000154972
}

If anyone could help, much would be appreciated

Could you please remove the result folder and re-run below command?

!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k tlt_encode
-n resnet18_detector
–gpus $NUM_GPUS

I got it to work now by changing the directories around. Thank you Morganh for your help!

Hello manbencharongkul,
Could you please tell us how did you manage to solve this issue?
I am having the same error.

Please remove the result folder.