Problem with training peoplenetv2

I’m trying to train the Peoplenet_v2 from nvidia but am running to the following error when executing:

!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k tlt_encode
-n resnet18_detector
–gpus $NUM_GPUS

!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt \

                    -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \

                    -k tlt_encode \

                    -n resnet18_detector \

                    --gpus $NUM_GPUS

Using TensorFlow backend.
2020-12-29 21:17:12.470165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.318526: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-29 21:17:15.330095: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.330443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2020-12-29 21:17:15.330469: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.330517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-29 21:17:15.331537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-29 21:17:15.331824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-29 21:17:15.333273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-29 21:17:15.334346: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-29 21:17:15.334395: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-29 21:17:15.334498: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.335084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.335502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-29 21:17:15.335531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:15.876336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-29 21:17:15.876394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-12-29 21:17:15.876405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-12-29 21:17:15.876644: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:15.877799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6774 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-29 21:17:15,878 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/examples/detectnet_v2/specs/peoplenet_v2_train_resnet18_kitti.txt.
2020-12-29 21:17:15,880 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/examples/detectnet_v2/specs/peoplenet_v2_train_resnet18_kitti.txt
2020-12-29 21:17:16,498 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 6434 samples with a batch size of 16; each epoch will therefore take one extra step.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 3, 544, 960)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 272, 480) 9472        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 272, 480) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 272, 480) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 64, 136, 240) 0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       activation_2[0][0]               
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 136, 240) 0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 64, 136, 240) 0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 136, 240) 36928       activation_3[0][0]               
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 64, 136, 240) 0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 136, 240) 36928       activation_4[0][0]               
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160        activation_3[0][0]               
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 136, 240) 256         block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 136, 240) 0           block_1b_bn_2[0][0]              
                                                                 block_1b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 64, 136, 240) 0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 68, 120) 73856       activation_5[0][0]               
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 128, 68, 120) 0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      activation_6[0][0]               
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320        activation_5[0][0]               
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 68, 120) 0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 128, 68, 120) 0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 68, 120) 147584      activation_7[0][0]               
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 128, 68, 120) 0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 68, 120) 147584      activation_8[0][0]               
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 68, 120) 16512       activation_7[0][0]               
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 68, 120) 512         block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 68, 120) 0           block_2b_bn_2[0][0]              
                                                                 block_2b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 128, 68, 120) 0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 34, 60)  295168      activation_9[0][0]               
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 256, 34, 60)  0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      activation_10[0][0]              
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60)  33024       activation_9[0][0]               
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 34, 60)  0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 256, 34, 60)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 34, 60)  590080      activation_11[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 256, 34, 60)  0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 34, 60)  590080      activation_12[0][0]              
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 34, 60)  65792       activation_11[0][0]              
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60)  1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 34, 60)  1024        block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 34, 60)  0           block_3b_bn_2[0][0]              
                                                                 block_3b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 256, 34, 60)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 34, 60)  1180160     activation_13[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 512, 34, 60)  0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     activation_14[0][0]              
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60)  131584      activation_13[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 34, 60)  0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 512, 34, 60)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 34, 60)  2359808     activation_15[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 512, 34, 60)  0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 34, 60)  2359808     activation_16[0][0]              
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 34, 60)  262656      activation_15[0][0]              
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60)  2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 34, 60)  2048        block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 34, 60)  0           block_4b_bn_2[0][0]              
                                                                 block_4b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 512, 34, 60)  0           add_8[0][0]                      
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 12, 34, 60)   6156        activation_17[0][0]              
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 3, 34, 60)    1539        activation_17[0][0]              
==================================================================================================
Total params: 11,555,983
Trainable params: 11,544,335
Non-trainable params: 11,648
__________________________________________________________________________________________________
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-29 21:17:20,525 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-29 21:17:20,526 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-12-29 21:17:20,526 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 6434, number of sources: 1, batch size per gpu: 16, steps: 403

2020-12-29 21:17:20,655 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-29 21:17:20.696742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.697123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
2020-12-29 21:17:20.697174: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-29 21:17:20.697225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-29 21:17:20.697260: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-29 21:17:20.697288: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-29 21:17:20.697314: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-29 21:17:20.697340: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-29 21:17:20.697362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-29 21:17:20.697439: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.697758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-29 21:17:20.698026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-29 21:17:20,931 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2020-12-29 21:17:20,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-29 21:17:20,938 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-29 21:17:21,619 [INFO] iva.detectnet_v2.scripts.train: Found 6434 samples in training set
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-12-29 21:17:24,596 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 1047, number of sources: 1, batch size per gpu: 16, steps: 66
2020-12-29 21:17:24,635 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-29 21:17:24,892 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2020-12-29 21:17:24,898 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-29 21:17:24,898 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-29 21:17:25,404 [INFO] iva.detectnet_v2.scripts.train: Found 1047 samples in validation set
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 121, in extract_checkpoint_file
  File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 55, in main
  File "<decorator-gen-2>", line 2, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 773, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 691, in run_experiment
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 624, in train_gridbox
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 141, in run_training_loop
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 177, in get_latest_checkpoint
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 155, in get_tf_ckpt
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/utilities.py", line 126, in extract_checkpoint_file
ValueError: The zipfile extracted was corrupt. Please check your key.

My training config is as follow:

random_seed: 42
dataset_config {
data_sources {
    tfrecords_path: '/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*'
    image_directory_path: '/workspace/tlt-experiments/data/training'
}
image_extension: 'jpg'
target_class_mapping {
    key: 'person'
    value: 'Person'
}
    target_class_mapping {
        key: 'Person'
        value: 'Person'
    }
    target_class_mapping {
        key: 'rider'
        value: 'Person'
    }
    target_class_mapping {
        key: 'Rider'
        value: 'Person'
    }
    target_class_mapping {
        key: 'personal_bag'
        value: 'Bag'
    }
    target_class_mapping {
        key: 'rolling_bag'
        value: 'Bag'
    }
    target_class_mapping {
        key: 'face'
        value: 'Face'
    }

    validation_fold: 0
    }
    augmentation_config {
    preprocessing {
    output_image_width: 960
    output_image_height: 544
    crop_right: 960
    crop_bottom: 544
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    }
    spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
    }
    color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
    }
    }
    postprocessing_config {
    target_class_config {
    key: 'Person'
    value {
    clustering_config {
    coverage_threshold: 0.00499999988824
    dbscan_eps: 0.20000000298
    dbscan_min_samples: 0.0500000007451
    minimum_bounding_box_height: 4
    }
    }
    }
    target_class_config {
    key: 'Bag'
    value {
    clustering_config {
    coverage_threshold: 0.00499999988824
    dbscan_eps: 0.15000000596
    dbscan_min_samples: 0.0500000007451
    minimum_bounding_box_height: 4
    }
    }
    }
    target_class_config {
    key: 'Face'
    value {
    clustering_config {
    coverage_threshold: 0.00499999988824
    dbscan_eps: 0.15000000596
    dbscan_min_samples: 0.0500000007451
    minimum_bounding_box_height: 4
    }
    }
    }
    }
    model_config {
    pretrained_model_file: '/workspace/tlt-experiments/detectnet_v2/pretrained_peoplenet/tlt_peoplenet_vunpruned_v2.0/resnet18_peoplenet.tlt'
    num_layers: 18
    load_graph: True
    use_batch_norm: False
    activation {
    activation_type: 'relu'
    }
    objective_set {
    bbox {
    scale: 35.0
    offset: 0.5
    }
    cov {
    }
    }
    training_precision {
    backend_floatx: FLOAT32
    }
    arch: 'resnet'
    }
    evaluation_config {
    validation_period_during_training: 1
    first_validation_epoch: 1
    minimum_detection_ground_truth_overlap {
    key: 'Person'
    value: 0.699999988079
    }
    minimum_detection_ground_truth_overlap {
    key: 'Bag'
    value: 0.5
    }
    minimum_detection_ground_truth_overlap {
    key: 'Face'
    value: 0.5
    }
    evaluation_box_config {
    key: 'Person'
    value {
    minimum_height: 20
    maximum_height: 9999
    minimum_width: 10
    maximum_width: 9999
    }
    }
    evaluation_box_config {
    key: 'Bag'
    value {
    minimum_height: 20
    maximum_height: 9999
    minimum_width: 10
    maximum_width: 9999
    }
    }
    evaluation_box_config {
    key: 'Face'
    value {
    minimum_height: 20
    maximum_height: 9999
    minimum_width: 10
    maximum_width: 9999
    }
    }
    average_precision_mode: INTEGRATE
    }
    cost_function_config {
    target_classes {
    name: 'Person'
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
    name: 'cov'
    initial_weight: 1.0
    weight_target: 1.0
    }
    objectives {
    name: 'bbox'
    initial_weight: 10.0
    weight_target: 10.0
    }
    }
    target_classes {
    name: 'Bag'
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
    name: 'cov'
    initial_weight: 1.0
    weight_target: 1.0
    }
    objectives {
    name: 'bbox'
    initial_weight: 10.0
    weight_target: 1.0
    }
    }
    target_classes {
    name: 'Face'
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
    name: 'cov'
    initial_weight: 1.0
    weight_target: 1.0
    }
    objectives {
    name: 'bbox'
    initial_weight: 10.0
    weight_target: 10.0
    }
    }
    enable_autoweighting: true
    max_objective_weight: 0.999899983406
    min_objective_weight: 9.99999974738e-05
    }
    training_config {
    batch_size_per_gpu: 16
    num_epochs: 10
    learning_rate {
    soft_start_annealing_schedule {
    min_learning_rate: 10e-10
    max_learning_rate: 10e-10
    soft_start: 0.0
    annealing: 0.3
    }
    }
    regularizer {
    type: L1
    weight: 3.00000002618e-09
    }
    optimizer {
    adam {
    epsilon: 9.99999993923e-09
    beta1: 0.899999976158
    beta2: 0.999000012875
    }
    }
    cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
    }
    checkpoint_interval: 10
    }
    bbox_rasterizer_config {
    target_class_config {
    key: 'Person'
    value {
    cov_center_x: 0.5
    cov_center_y: 0.5
    cov_radius_x: 0.40000000596
    cov_radius_y: 0.40000000596
    bbox_min_radius: 1.0
    }
    }
    target_class_config {
    key: 'Bag'
    value {
    cov_center_x: 0.5
    cov_center_y: 0.5
    cov_radius_x: 1.0
    cov_radius_y: 1.0
    bbox_min_radius: 1.0
    }
    }
    target_class_config {
    key: 'Face'
    value {
    cov_center_x: 0.5
    cov_center_y: 0.5
    cov_radius_x: 1.0
    cov_radius_y: 1.0
    bbox_min_radius: 1.0
    }
    }
    deadzone_radius: 0.400000154972
}

If anyone could help, much would be appreciated

Could you please remove the result folder and re-run below command?

!tlt-train detectnet_v2 -e $SPECS_DIR/peoplenet_v2_train_resnet18_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k tlt_encode
-n resnet18_detector
–gpus $NUM_GPUS

I got it to work now by changing the directories around. Thank you Morganh for your help!