I have learnt that in order to train on TLT, the dataset images must have same size. But in KITTI dataset, image size varies. Is there any script to resize all the images in KITTI dataset with labels simultaneously?
Hi xhuv,
In TLT 1.0.1 version, for detectnet_v2 and ssd network, the tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.
For KITTI dataset, the image size are mostly the same. So, it is not needed to resize.
With the original dataset without resizing, when I run
!tlt-train detectnet_v2 -e $SPECS_DIR/train.txt \
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
-k $KEY \
-n resnet18_detector
The train completes, but shows ZERO average precision after 120 epoch
Using TensorFlow backend.
--------------------------------------------------------------------------
[[5279,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: 629ffbf9ff63
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
2020-03-10 17:56:15.977105: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-10 17:56:16.112012: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-10 17:56:16.112770: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5f44020 executing computations on platform CUDA. Devices:
2020-03-10 17:56:16.112787: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-03-10 17:56:16.114468: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2020-03-10 17:56:16.114843: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5facdb0 executing computations on platform Host. Devices:
2020-03-10 17:56:16.114860: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2020-03-10 17:56:16.115005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 7.19GiB
2020-03-10 17:56:16.115023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-10 17:56:16.116016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-10 17:56:16.116030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-10 17:56:16.116038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-10 17:56:16.116112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6998 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-03-10 17:56:16,116 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/spec_files/train.txt.
2020-03-10 17:56:16,117 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/spec_files/train.txt
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2020-03-10 17:56:16,125 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2020-03-10 17:56:16,192 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 6359 samples with a batch size of 16; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-10 17:56:16,199 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2020-03-10 17:56:16,211 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 128, 512) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 64, 256) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 64, 256) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 64, 256) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 32, 128) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 32, 128) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 32, 128) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 32, 128) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 32, 128) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 32, 128) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 32, 128) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 32, 128) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 32, 128) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 32, 128) 36928 activation_3[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 32, 128) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 64, 32, 128) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 32, 128) 36928 activation_4[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 32, 128) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 32, 128) 0 block_1b_bn_2[0][0]
activation_3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 64, 32, 128) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 16, 64) 73856 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 16, 64) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 128, 16, 64) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 16, 64) 147584 activation_6[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 16, 64) 8320 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 16, 64) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 16, 64) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 16, 64) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 128, 16, 64) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 16, 64) 147584 activation_7[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 16, 64) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 128, 16, 64) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 16, 64) 147584 activation_8[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 16, 64) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 16, 64) 0 block_2b_bn_2[0][0]
activation_7[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 128, 16, 64) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 8, 32) 295168 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 8, 32) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 256, 8, 32) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 8, 32) 590080 activation_10[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 8, 32) 33024 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 8, 32) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 8, 32) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 8, 32) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 256, 8, 32) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 8, 32) 590080 activation_11[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 8, 32) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 256, 8, 32) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 8, 32) 590080 activation_12[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 8, 32) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 8, 32) 0 block_3b_bn_2[0][0]
activation_11[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 256, 8, 32) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 8, 32) 1180160 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 8, 32) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 512, 8, 32) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 8, 32) 2359808 activation_14[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 8, 32) 131584 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 8, 32) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 8, 32) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 8, 32) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 512, 8, 32) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 8, 32) 2359808 activation_15[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 8, 32) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 512, 8, 32) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 8, 32) 2359808 activation_16[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 8, 32) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 8, 32) 0 block_4b_bn_2[0][0]
activation_15[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 512, 8, 32) 0 add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 12, 8, 32) 6156 activation_17[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 3, 8, 32) 1539 activation_17[0][0]
==================================================================================================
Total params: 11,203,023
Trainable params: 11,183,823
Non-trainable params: 19,200
__________________________________________________________________________________________________
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-03-10 17:56:34,721 [INFO] iva.detectnet_v2.scripts.train: Found 6359 samples in training set
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-03-10 17:56:50,115 [INFO] iva.detectnet_v2.scripts.train: Found 1122 samples in validation set
INFO:tensorflow:Create CheckpointSaverHook.
2020-03-10 17:57:03,418 [INFO] tensorflow: Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2020-03-10 17:57:04,534 [INFO] tensorflow: Graph was finalized.
2020-03-10 17:57:04.535111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-10 17:57:04.535159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-10 17:57:04.535187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-10 17:57:04.535194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-10 17:57:04.535313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6998 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
2020-03-10 17:57:07,660 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2020-03-10 17:57:08,314 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2020-03-10 17:57:36,451 [INFO] tensorflow: Saving checkpoints for step-0.
2020-03-10 17:58:19.675048: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2020-03-10 17:58:19.947042: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x5fee740
INFO:tensorflow:epoch = 0.0, loss = 0.08104235, step = 0
2020-03-10 17:58:24,004 [INFO] tensorflow: epoch = 0.0, loss = 0.08104235, step = 0
2020-03-10 17:58:24,006 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/task_progress_monitor_hook.pyc: Epoch 0/120: loss: 0.08104 Time taken: 0:00:00 ETA: 0:00:00
INFO:tensorflow:epoch = 0.005025125628140704, loss = 0.07796858, step = 2 (12.107 sec)
2020-03-10 17:58:36,111 [INFO] tensorflow: epoch = 0.005025125628140704, loss = 0.07796858, step = 2 (12.107 sec)
2020-03-10 17:58:37,909 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 16.226
INFO:tensorflow:global_step/sec: 1.8675
2020-03-10 17:58:44,889 [INFO] tensorflow: global_step/sec: 1.8675
INFO:tensorflow:epoch = 0.10050251256281408, loss = 0.049496885, step = 40 (8.868 sec)
2020-03-10 17:58:44,979 [INFO] tensorflow: epoch = 0.10050251256281408, loss = 0.049496885, step = 40 (8.868 sec)
2020-03-10 17:58:45,711 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 51.275
2020-03-10 17:58:47,717 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 199.452
INFO:tensorflow:global_step/sec: 12.3764
2020-03-10 17:58:48,040 [INFO] tensorflow: global_step/sec: 12.3764
2020-03-10 17:58:49,728 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.929
INFO:tensorflow:epoch = 0.2613065326633166, loss = 0.013850397, step = 104 (5.148 sec)
2020-03-10 17:58:50,126 [INFO] tensorflow: epoch = 0.2613065326633166, loss = 0.013850397, step = 104 (5.148 sec)
INFO:tensorflow:global_step/sec: 12.4458
2020-03-10 17:58:51,173 [INFO] tensorflow: global_step/sec: 12.4458
2020-03-10 17:58:51,737 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 199.081
2020-03-10 17:58:53,742 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 199.494
INFO:tensorflow:global_step/sec: 12.4714
2020-03-10 17:58:54,301 [INFO] tensorflow: global_step/sec: 12.4714
INFO:tensorflow:epoch = 0.4221105527638191, loss = 0.0035675862, step = 168 (5.131 sec)
2020-03-10 17:58:55,257 [INFO] tensorflow: epoch = 0.4221105527638191, loss = 0.0035675862, step = 168 (5.131 sec)
2020-03-10 17:58:55,746 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 199.704
INFO:tensorflow:global_step/sec: 12.3694
2020-03-10 17:58:57,453 [INFO] tensorflow: global_step/sec: 12.3694
2020-03-10 17:58:57,777 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 196.937
2020-03-10 17:58:59,794 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.315
INFO:tensorflow:epoch = 0.5804020100502513, loss = 0.0027891295, step = 231 (5.102 sec)
2020-03-10 17:59:00,359 [INFO] tensorflow: epoch = 0.5804020100502513, loss = 0.0027891295, step = 231 (5.102 sec)
INFO:tensorflow:global_step/sec: 12.3756
2020-03-10 17:59:00,605 [INFO] tensorflow: global_step/sec: 12.3756
2020-03-10 17:59:01,814 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.064
INFO:tensorflow:global_step/sec: 12.4258
2020-03-10 17:59:03,743 [INFO] tensorflow: global_step/sec: 12.4258
2020-03-10 17:59:03,827 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.771
INFO:tensorflow:epoch = 0.7412060301507538, loss = 0.0019985866, step = 295 (5.156 sec)
2020-03-10 17:59:05,515 [INFO] tensorflow: epoch = 0.7412060301507538, loss = 0.0019985866, step = 295 (5.156 sec)
2020-03-10 17:59:05,841 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.595
INFO:tensorflow:global_step/sec: 12.3988
2020-03-10 17:59:06,889 [INFO] tensorflow: global_step/sec: 12.3988
2020-03-10 17:59:07,858 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.285
2020-03-10 17:59:09,864 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 199.485
INFO:tensorflow:global_step/sec: 12.4174
2020-03-10 17:59:10,030 [INFO] tensorflow: global_step/sec: 12.4174
INFO:tensorflow:epoch = 0.8994974874371859, loss = 0.0016591488, step = 358 (5.083 sec)
2020-03-10 17:59:10,598 [INFO] tensorflow: epoch = 0.8994974874371859, loss = 0.0016591488, step = 358 (5.083 sec)
2020-03-10 17:59:11,883 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.096
INFO:tensorflow:global_step/sec: 12.3317
2020-03-10 17:59:13,192 [INFO] tensorflow: global_step/sec: 12.3317
2020-03-10 17:59:14,363 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 70, 0.00s/step
/usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/evaluation/metadata.py:38: UserWarning: One or more metadata field(s) are missing from ground_truth batch_data, and will be replaced with defaults: ['frame/camera_location']
2020-03-10 17:59:21,060 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 70, 0.67s/step
2020-03-10 17:59:23,421 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 70, 0.24s/step
2020-03-10 17:59:25,764 [INFO] iva.detectnet_v2.evaluation.evaluation: step 30 / 70, 0.23s/step
2020-03-10 17:59:28,108 [INFO] iva.detectnet_v2.evaluation.evaluation: step 40 / 70, 0.23s/step
2020-03-10 17:59:30,458 [INFO] iva.detectnet_v2.evaluation.evaluation: step 50 / 70, 0.24s/step
2020-03-10 17:59:32,803 [INFO] iva.detectnet_v2.evaluation.evaluation: step 60 / 70, 0.23s/step
Matching predictions to ground truth, class 1/3.: 100%|#| 8/8 [00:00<00:00, 8790.79it/s]
Matching predictions to ground truth, class 2/3.: 100%|#| 9/9 [00:00<00:00, 8167.19it/s]
Matching predictions to ground truth, class 3/3.: 100%|#| 1/1 [00:00<00:00, 889.57it/s]
/usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/evaluation/compute_metrics.py:717: RuntimeWarning: invalid value encountered in true_divide
Epoch 1/120
=========================
Validation cost: 0.000008
Mean average_precision (in %): 0.0000
class name average precision (in %)
------------ --------------------------
car 0
cyclist 0
pedestrian 0
Median Inference Time: 0.003606
INFO:tensorflow:epoch = 1.0, loss = 0.00052329456, step = 398 (24.667 sec)
2020-03-10 17:59:35,265 [INFO] tensorflow: epoch = 1.0, loss = 0.00052329456, step = 398 (24.667 sec)
2020-03-10 17:59:35,265 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/task_progress_monitor_hook.pyc: Epoch 1/120: loss: 0.00052 Time taken: 0:01:21.906016 ETA: 2:42:26.815904
2020-03-10 17:59:35,359 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 17.039
2020-03-10 17:59:37,375 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.457
INFO:tensorflow:global_step/sec: 1.58629
2020-03-10 17:59:37,778 [INFO] tensorflow: global_step/sec: 1.58629
2020-03-10 17:59:39,390 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.539
INFO:tensorflow:epoch = 1.1582914572864322, loss = 0.00058055145, step = 461 (5.096 sec)
2020-03-10 17:59:40,361 [INFO] tensorflow: epoch = 1.1582914572864322, loss = 0.00058055145, step = 461 (5.096 sec)
INFO:tensorflow:global_step/sec: 12.3628
2020-03-10 17:59:40,933 [INFO] tensorflow: global_step/sec: 12.3628
2020-03-10 17:59:41,421 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 196.973
2020-03-10 17:59:43,440 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.187
INFO:tensorflow:global_step/sec: 12.2945
2020-03-10 17:59:44,105 [INFO] tensorflow: global_step/sec: 12.2945
INFO:tensorflow:epoch = 1.3165829145728642, loss = 0.00052331056, step = 524 (5.131 sec)
2020-03-10 17:59:45,492 [INFO] tensorflow: epoch = 1.3165829145728642, loss = 0.00052331056, step = 524 (5.131 sec)
2020-03-10 17:59:45,493 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 194.860
INFO:tensorflow:global_step/sec: 12.3807
2020-03-10 17:59:47,255 [INFO] tensorflow: global_step/sec: 12.3807
2020-03-10 17:59:47,499 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 199.339
2020-03-10 17:59:49,515 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 198.521
INFO:tensorflow:global_step/sec: 12.3445
2020-03-10 17:59:50,414 [INFO] tensorflow: global_step/sec: 12.3445
INFO:tensorflow:epoch = 1.4748743718592965, loss = 0.00052351336, step = 587 (5.101 sec)
2020-03-10 17:59:50,594 [INFO] tensorflow: epoch = 1.4748743718592965, loss = 0.00052351336, step = 587 (5.101 sec)
2020-03-10 17:59:51,553 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 196.228
INFO:tensorflow:global_step/sec: 12.3396
2020-03-10 17:59:53,575 [INFO] tensorflow: global_step/sec: 12.3396
2020-03-10 17:59:53,575 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 197.819
2020-03-10 17:59:55,607 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/tfhooks/sample_counter_hook.pyc: Samples / sec: 196.891
INFO:tensorflow:epoch = 1.6331658291457287, loss = 0.00052330946, step = 650 (5.095 sec)
2020-03-10 17:59:55,688 [INFO] tensorflow: epoch = 1.6331658291457287, loss = 0.00052330946, step = 650 (5.095 sec)
---------------------------------------------------------------------
---------------------------------------------------------------------
Epoch 120/120
=========================
Validation cost: 0.000006
Mean average_precision (in %): 0.0000
class name average precision (in %)
------------ --------------------------
car 0
cyclist 0
pedestrian 0
Median Inference Time: 0.003395
Time taken to run iva.detectnet_v2.scripts.train:main: 1:10:33.431308.
My train.txt file looks like this:
random_seed: 42
model_config {
pretrained_model_file: "/workspace/pretrained_model/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5"
num_layers: 18
freeze_blocks: 0
arch: "resnet"
use_batch_norm: true
activation {
activation_type: "relu"
}
dropout_rate: 0.1
objective_set: {
cov {}
bbox {
scale: 35.0
offset: 0.5
}
}
training_precision {
backend_floatx: FLOAT32
}
}
bbox_rasterizer_config {
target_class_config {
key: "car"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: "pedestrian"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: "cyclist"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.67
}
cost_function_config {
target_classes {
name: "car"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: "pedestrian"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: "cyclist"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: True
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}
training_config {
batch_size_per_gpu: 16
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
enabled: False
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
augmentation_config {
preprocessing {
output_image_width: 512
output_image_height: 128
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: "car"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.13
dbscan_min_samples: 0.05
minimum_bounding_box_height: 1
}
}
}
target_class_config {
key: "pedestrian"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 1
}
}
}
target_class_config {
key: "cyclist"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 1
}
}
}
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tf_records/*"
image_directory_path: "/workspace/dataset/KITTI_original/training"
}
image_extension: "jpg"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "pedestrian"
value: "pedestrian"
}
target_class_mapping {
key: "cyclist"
value: "cyclist"
}
validation_fold: 0
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: "car"
value: 0.7
}
minimum_detection_ground_truth_overlap {
key: "pedestrian"
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: "cyclist"
value: 0.5
}
evaluation_box_config {
key: "car"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: "pedestrian"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: "cyclist"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}
I am confused whether I am doing anything wrong. Please help
You are setting below in training spec. It does not match the actual size of KITTI dataset. KITTI dataset is about 1248x384. So, please modify to 1248x384.
output_image_width: 512
output_image_height: 128
More, TLT 1.0.1 docker has the jupyter notebooks for referece. There are also training specs for KITTI dataset by default.
Thanks. I have updated those parameters and training works good with the following result:
Epoch 120/120
=========================
Validation cost: 0.000082
Mean average_precision (in %): 71.5213
class name average precision (in %)
------------ --------------------------
car 85.4542
cyclist 56.6675
pedestrian 72.442
Median Inference Time: 0.011675
Time taken to run iva.detectnet_v2.scripts.train:main: 5:36:31.637871.
But during evaluation
!tlt-evaluate detectnet_v2 -e $SPECS_DIR/train.txt\
-m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet10_detector.tlt \
-k $KEY \
It throws back this error:
Using TensorFlow backend.
2020-03-12 06:39:52,966 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/spec_files/train.txt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-12 06:39:53,506 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-12 06:39:54.628252: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-12 06:39:54.735884: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-12 06:39:54.736622: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x8097390 executing computations on platform CUDA. Devices:
2020-03-12 06:39:54.736657: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-03-12 06:39:54.758318: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2020-03-12 06:39:54.758873: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x81000e0 executing computations on platform Host. Devices:
2020-03-12 06:39:54.758909: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2020-03-12 06:39:54.759088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 6.89GiB
2020-03-12 06:39:54.759118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 06:39:54.760118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 06:39:54.760132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 06:39:54.760139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 06:39:54.760237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6707 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2020-03-12 06:39:55,753 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-03-12 06:40:01,867 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/evaluation/build_evaluator.pyc: Found 1122 samples in validation set
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 384, 1248) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 96, 312) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_3[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 64, 96, 312) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_4[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 96, 312) 0 block_1b_bn_2[0][0]
activation_3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 64, 96, 312) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_6[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 128, 48, 156) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 48, 156) 147584 activation_7[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 128, 48, 156) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_8[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 48, 156) 0 block_2b_bn_2[0][0]
activation_7[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 128, 48, 156) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_10[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 256, 24, 78) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 24, 78) 590080 activation_11[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 256, 24, 78) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_12[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 24, 78) 0 block_3b_bn_2[0][0]
activation_11[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 256, 24, 78) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_14[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 512, 24, 78) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 24, 78) 2359808 activation_15[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 512, 24, 78) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_16[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 24, 78) 0 block_4b_bn_2[0][0]
activation_15[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 512, 24, 78) 0 add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 12, 24, 78) 6156 activation_17[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 3, 24, 78) 1539 activation_17[0][0]
==================================================================================================
Total params: 11,203,023
Trainable params: 11,183,823
Non-trainable params: 19,200
__________________________________________________________________________________________________
INFO:tensorflow:Graph was finalized.
2020-03-12 06:40:09,368 [INFO] tensorflow: Graph was finalized.
2020-03-12 06:40:09.369112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 06:40:09.369166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 06:40:09.369174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 06:40:09.369180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 06:40:09.369294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6707 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
2020-03-12 06:40:10,590 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2020-03-12 06:40:10,887 [INFO] tensorflow: Done running local_init_op.
2020-03-12 06:40:12,780 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 71, 0.00s/step
2020-03-12 06:40:16.856709: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2020-03-12 06:40:17.693190: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-12 06:40:17.697520: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/usr/local/bin/tlt-evaluate", line 10, in <module>
sys.exit(main())
File "./common/magnet_evaluate.py", line 38, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/evaluate.py", line 126, in main
File "./detectnet_v2/evaluation/evaluation.py", line 156, in evaluate
File "./detectnet_v2/evaluation/evaluation.py", line 116, in _get_validation_iterator
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet18_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_352 (defined at ./detectnet_v2/model/utilities.py:53) ]]
Caused by op u'resnet18_nopool_bn_detectnet_v2/conv1/convolution', defined at:
File "/usr/local/bin/tlt-evaluate", line 10, in <module>
sys.exit(main())
File "./common/magnet_evaluate.py", line 38, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/evaluate.py", line 119, in main
File "./detectnet_v2/evaluation/build_evaluator.py", line 124, in build_evaluator_for_trained_gridbox
File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
File "./detectnet_v2/model/detectnet_model.py", line 617, in build_validation_graph
File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
File "./detectnet_v2/model/detectnet_model.py", line 577, in build_inference_graph
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 564, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 721, in run_internal_graph
layer.call(computed_tensor, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 171, in call
dilation_rate=self.dilation_rate)
File "/opt/nvidia/third_party/keras/tensorflow_backend.py", line 93, in conv2d
data_format=tf_data_format)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet18_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_352 (defined at ./detectnet_v2/model/utilities.py:53) ]]
According to your training spec, you were training a resnet18 tlt model.
Why is resnet10_detector.tlt in your tlt-evaluate command?
I have previously attempted resnet 18. Currently, my train specs looks like this which I have trained on.
random_seed: 42
model_config {
pretrained_model_file: "/workspace/pretrained_model/tlt_resnet10_detectnet_v2_v1/resnet10.hdf5"
num_layers: 18
freeze_blocks: 0
arch: "resnet"
use_batch_norm: true
activation {
activation_type: "relu"
}
dropout_rate: 0.1
objective_set: {
cov {}
bbox {
scale: 35.0
offset: 0.5
}
}
training_precision {
backend_floatx: FLOAT32
}
}
bbox_rasterizer_config {
target_class_config {
key: "car"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: "pedestrian"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: "cyclist"
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.67
}
cost_function_config {
target_classes {
name: "car"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: "pedestrian"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: "cyclist"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: True
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}
training_config {
batch_size_per_gpu: 16
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
enabled: False
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
augmentation_config {
preprocessing {
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: "car"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.13
dbscan_min_samples: 0.05
minimum_bounding_box_height: 1
}
}
}
target_class_config {
key: "pedestrian"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 1
}
}
}
target_class_config {
key: "cyclist"
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 1
}
}
}
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tf_records/*"
image_directory_path: "/workspace/dataset/KITTI_original/training"
}
image_extension: "jpg"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "pedestrian"
value: "pedestrian"
}
target_class_mapping {
key: "cyclist"
value: "cyclist"
}
validation_fold: 0
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: "car"
value: 0.7
}
minimum_detection_ground_truth_overlap {
key: "pedestrian"
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: "cyclist"
value: 0.5
}
evaluation_box_config {
key: "car"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: "pedestrian"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: "cyclist"
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}
Sorry there might be some issues with
num_layers: 18
. I’ll correct it and post updates.
The error remains same:
Using TensorFlow backend.
2020-03-12 16:13:47,534 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/spec_files/train.txt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-12 16:13:48,077 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-12 16:13:48.729739: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-12 16:13:48.831875: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-12 16:13:48.832559: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7664140 executing computations on platform CUDA. Devices:
2020-03-12 16:13:48.832594: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-03-12 16:13:48.858352: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2020-03-12 16:13:48.859118: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x76cce90 executing computations on platform Host. Devices:
2020-03-12 16:13:48.859139: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2020-03-12 16:13:48.859275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 6.95GiB
2020-03-12 16:13:48.859292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 16:13:48.860084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 16:13:48.860097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 16:13:48.860105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 16:13:48.860186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6764 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2020-03-12 16:13:49,968 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-03-12 16:13:56,039 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/evaluation/build_evaluator.pyc: Found 1122 samples in validation set
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 384, 1248) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 96, 312) 0 add_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 activation_3[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_4[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 activation_3[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 128, 48, 156) 0 add_2[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 activation_5[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_6[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 activation_5[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 256, 24, 78) 0 add_3[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 activation_7[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_8[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 activation_7[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 512, 24, 78) 0 add_4[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 12, 24, 78) 6156 activation_9[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 3, 24, 78) 1539 activation_9[0][0]
==================================================================================================
Total params: 4,926,543
Trainable params: 4,911,183
Non-trainable params: 15,360
__________________________________________________________________________________________________
INFO:tensorflow:Graph was finalized.
2020-03-12 16:14:03,516 [INFO] tensorflow: Graph was finalized.
2020-03-12 16:14:03.516861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 16:14:03.516900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 16:14:03.516909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 16:14:03.516916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 16:14:03.516994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6764 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
2020-03-12 16:14:04,576 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2020-03-12 16:14:04,847 [INFO] tensorflow: Done running local_init_op.
2020-03-12 16:14:06,510 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 71, 0.00s/step
2020-03-12 16:14:10.638651: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2020-03-12 16:14:11.514059: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-12 16:14:11.540502: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/usr/local/bin/tlt-evaluate", line 10, in <module>
sys.exit(main())
File "./common/magnet_evaluate.py", line 38, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/evaluate.py", line 126, in main
File "./detectnet_v2/evaluation/evaluation.py", line 156, in evaluate
File "./detectnet_v2/evaluation/evaluation.py", line 116, in _get_validation_iterator
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet10_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_355 (defined at ./detectnet_v2/model/utilities.py:53) ]]
Caused by op u'resnet10_nopool_bn_detectnet_v2/conv1/convolution', defined at:
File "/usr/local/bin/tlt-evaluate", line 10, in <module>
sys.exit(main())
File "./common/magnet_evaluate.py", line 38, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/evaluate.py", line 119, in main
File "./detectnet_v2/evaluation/build_evaluator.py", line 124, in build_evaluator_for_trained_gridbox
File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
File "./detectnet_v2/model/detectnet_model.py", line 617, in build_validation_graph
File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
File "./detectnet_v2/model/detectnet_model.py", line 577, in build_inference_graph
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 564, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 721, in run_internal_graph
layer.call(computed_tensor, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 171, in call
dilation_rate=self.dilation_rate)
File "/opt/nvidia/third_party/keras/tensorflow_backend.py", line 93, in conv2d
data_format=tf_data_format)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet10_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_355 (defined at ./detectnet_v2/model/utilities.py:53) ]]