Hello Morganh, Thank you for the assists. Actually I have tried an experiment. I changed the resolution from 800 X 600 to 1024 X 576. I can successfully convert TFrecords for kitti trainval dataset. But I got the same error when I run TLT training.
Here is the error:
Using TensorFlow backend.
2019-11-25 10:47:32.266149: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-25 10:47:32.319302: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-25 10:47:32.319755: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5e89d70 executing computations on platform CUDA. Devices:
2019-11-25 10:47:32.319787: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 950M, Compute Capability 5.0
2019-11-25 10:47:32.322018: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2593785000 Hz
2019-11-25 10:47:32.322477: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5fa1f40 executing computations on platform Host. Devices:
2019-11-25 10:47:32.322503: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2019-11-25 10:47:32.322652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:0a:00.0
totalMemory: 3.95GiB freeMemory: 3.65GiB
2019-11-25 10:47:32.322675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-25 10:47:32.323227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-25 10:47:32.323242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-11-25 10:47:32.323260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-11-25 10:47:32.323324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3440 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:0a:00.0, compute capability: 5.0)
2019-11-25 10:47:32,324 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/tlt-experiments/detectnet_v2_train_resnet18_kitti.txt.
2019-11-25 10:47:32,325 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/detectnet_v2_train_resnet18_kitti.txt
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2019-11-25 10:47:32,337 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2019-11-25 10:47:32,378 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 19 samples with a batch size of 4; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-25 10:47:32,384 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2019-11-25 10:47:32,398 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 576, 1024) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 288, 512) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 288, 512) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 288, 512) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 144, 256) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 144, 256) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 144, 256) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 144, 256) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 144, 256) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 144, 256) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 144, 256) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 144, 256) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 144, 256) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 144, 256) 36928 activation_3[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 144, 256) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 64, 144, 256) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 144, 256) 36928 activation_4[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 144, 256) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 144, 256) 0 block_1b_bn_2[0][0]
activation_3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 64, 144, 256) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 72, 128) 73856 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 72, 128) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 128, 72, 128) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 72, 128) 147584 activation_6[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 72, 128) 8320 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 72, 128) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 72, 128) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 72, 128) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 128, 72, 128) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 72, 128) 147584 activation_7[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 72, 128) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 128, 72, 128) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 72, 128) 147584 activation_8[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 72, 128) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 72, 128) 0 block_2b_bn_2[0][0]
activation_7[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 128, 72, 128) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 36, 64) 295168 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 36, 64) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 256, 36, 64) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 36, 64) 590080 activation_10[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 36, 64) 33024 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 36, 64) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 36, 64) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 36, 64) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 256, 36, 64) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 36, 64) 590080 activation_11[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 36, 64) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 256, 36, 64) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 36, 64) 590080 activation_12[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 36, 64) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 36, 64) 0 block_3b_bn_2[0][0]
activation_11[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 256, 36, 64) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 36, 64) 1180160 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 36, 64) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 512, 36, 64) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 36, 64) 2359808 activation_14[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 36, 64) 131584 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 36, 64) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 36, 64) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 36, 64) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 512, 36, 64) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 36, 64) 2359808 activation_15[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 36, 64) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 512, 36, 64) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 36, 64) 2359808 activation_16[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 36, 64) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 36, 64) 0 block_4b_bn_2[0][0]
activation_15[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 512, 36, 64) 0 add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 4, 36, 64) 2052 activation_17[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 1, 36, 64) 513 activation_17[0][0]
==================================================================================================
Total params: 11,197,893
Trainable params: 11,188,165
Non-trainable params: 9,728
__________________________________________________________________________________________________
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2019-11-25 10:47:46,752 [INFO] iva.detectnet_v2.scripts.train: Found 19 samples in training set
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 10, in <module>
sys.exit(main())
File "./common/magnet_train.py", line 37, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/train.py", line 632, in main
File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
File "./detectnet_v2/scripts/train.py", line 479, in train_gridbox
File "./detectnet_v2/scripts/train.py", line 353, in build_validation_graph
File "./detectnet_v2/dataloader/default_dataloader.py", line 198, in get_dataset_tensors
File "./detectnet_v2/dataloader/utilities.py", line 181, in extract_tfrecords_features
StopIteration
Here is detectnet_v2_train_resnet18_kitti.txt
random_seed: 42
dataset_config {
data_sources {
tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/*"
image_directory_path: "/workspace/tlt-experiments/data/training"
}
image_extension: "jpg"
target_class_mapping {
key: "Bola"
value: "Bola"
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1024
output_image_height: 576
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: "Bola"
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: "/workspace/tlt-experiments/pretrained_resnet18/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5"
num_layers: 18
use_batch_norm: true
activation {
activation_type: "relu"
}
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: "resnet"
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: "Bola"
value: 0.699999988079
}
evaluation_box_config {
key: "Bola"
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: "Bola"
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: "cov"
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: "bbox"
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 4
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: "Bola"
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.67
}
Here is my detectnet_v2_tfrecords_kitti_trainval.txt
kitti_config {
root_directory_path: "/workspace/tlt-experiments/data/training"
image_dir_name: "Bola2"
label_dir_name: "Label2"
image_extension: ".jpg"
partition_mode: "random"
num_partitions: 2
val_split: 5
num_shards: 10
}
image_directory_path: "/workspace/tlt-experiments/data/training"
Do you have any idea?