Thanks @Morganh.
Here’s the command with the complete output:
$ tlt-train detectnet_v2 -r output -e detectnet2_resnet18_train.txt -k MY_API_KEY
Using TensorFlow backend.
2019-10-10 14:31:49.783663: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-10 14:31:49.911387: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-10 14:31:49.911900: I tensorflow/compiler/xla/service/] XLA service 0x6d3c2f0 executing computations on platform CUDA. Devices:
2019-10-10 14:31:49.911927: I tensorflow/compiler/xla/service/] StreamExecutor device (0): GeForce GTX 1050 Ti with Max-Q Design, Compute Capability 6.1
2019-10-10 14:31:49.913621: I tensorflow/core/platform/profile_utils/] CPU Frequency: 2208000000 Hz
2019-10-10 14:31:49.914307: I tensorflow/compiler/xla/service/] XLA service 0x6da5fd0 executing computations on platform Host. Devices:
2019-10-10 14:31:49.914331: I tensorflow/compiler/xla/service/] StreamExecutor device (0): <undefined>, <undefined>
2019-10-10 14:31:49.914462: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties:
name: GeForce GTX 1050 Ti with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.4175
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.32GiB
2019-10-10 14:31:49.914481: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2019-10-10 14:31:49.915109: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-10 14:31:49.915123: I tensorflow/core/common_runtime/gpu/] 0
2019-10-10 14:31:49.915131: I tensorflow/core/common_runtime/gpu/] 0: N
2019-10-10 14:31:49.915202: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3099 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-10 14:31:49,916 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at detectnet2_resnet18_train.txt.
2019-10-10 14:31:49,916 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from detectnet2_resnet18_train.txt
WARNING:tensorflow:From ./detectnet_v2/dataloader/ tf_record_iterator (from is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
2019-10-10 14:31:49,923 [WARNING] tensorflow: From ./detectnet_v2/dataloader/ tf_record_iterator (from is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
2019-10-10 14:31:50,040 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 5219 samples with a batch size of 4; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-10 14:31:50,045 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/ div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2019-10-10 14:31:50,058 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/ div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 3, 384, 1248) 0
conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0]
bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0]
activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0]
block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0]
block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0]
activation_2 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0]
block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_2[0][0]
block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0]
block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0]
block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]
add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0]
activation_3 (Activation) (None, 64, 96, 312) 0 add_1[0][0]
block_1b_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_3[0][0]
block_1b_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_1[0][0]
activation_4 (Activation) (None, 64, 96, 312) 0 block_1b_bn_1[0][0]
block_1b_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_4[0][0]
block_1b_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_2[0][0]
add_2 (Add) (None, 64, 96, 312) 0 block_1b_bn_2[0][0]
activation_5 (Activation) (None, 64, 96, 312) 0 add_2[0][0]
block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 activation_5[0][0]
block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0]
activation_6 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0]
block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_6[0][0]
block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 activation_5[0][0]
block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0]
block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]
add_3 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0]
activation_7 (Activation) (None, 128, 48, 156) 0 add_3[0][0]
block_2b_conv_1 (Conv2D) (None, 128, 48, 156) 147584 activation_7[0][0]
block_2b_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_1[0][0]
activation_8 (Activation) (None, 128, 48, 156) 0 block_2b_bn_1[0][0]
block_2b_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_8[0][0]
block_2b_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_2[0][0]
add_4 (Add) (None, 128, 48, 156) 0 block_2b_bn_2[0][0]
activation_9 (Activation) (None, 128, 48, 156) 0 add_4[0][0]
block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 activation_9[0][0]
block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0]
activation_10 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0]
block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_10[0][0]
block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 activation_9[0][0]
block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0]
block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]
add_5 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0]
activation_11 (Activation) (None, 256, 24, 78) 0 add_5[0][0]
block_3b_conv_1 (Conv2D) (None, 256, 24, 78) 590080 activation_11[0][0]
block_3b_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_1[0][0]
activation_12 (Activation) (None, 256, 24, 78) 0 block_3b_bn_1[0][0]
block_3b_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_12[0][0]
block_3b_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_2[0][0]
add_6 (Add) (None, 256, 24, 78) 0 block_3b_bn_2[0][0]
activation_13 (Activation) (None, 256, 24, 78) 0 add_6[0][0]
block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 activation_13[0][0]
block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0]
activation_14 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0]
block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_14[0][0]
block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 activation_13[0][0]
block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0]
block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]
add_7 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0]
activation_15 (Activation) (None, 512, 24, 78) 0 add_7[0][0]
block_4b_conv_1 (Conv2D) (None, 512, 24, 78) 2359808 activation_15[0][0]
block_4b_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_1[0][0]
activation_16 (Activation) (None, 512, 24, 78) 0 block_4b_bn_1[0][0]
block_4b_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_16[0][0]
block_4b_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_2[0][0]
add_8 (Add) (None, 512, 24, 78) 0 block_4b_bn_2[0][0]
activation_17 (Activation) (None, 512, 24, 78) 0 add_8[0][0]
output_bbox (Conv2D) (None, 8, 24, 78) 4104 activation_17[0][0]
output_cov (Conv2D) (None, 2, 24, 78) 1026 activation_17[0][0]
Total params: 11,200,458
Trainable params: 11,181,258
Non-trainable params: 19,200
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 10, in <module>
File "./common/", line 37, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/", line 46, in wrapped_fn
File "./detectnet_v2/scripts/", line 632, in main
File "./detectnet_v2/scripts/", line 556, in run_experiment
File "./detectnet_v2/scripts/", line 466, in train_gridbox
File "./detectnet_v2/scripts/", line 296, in build_training_graph
File "./detectnet_v2/dataloader/", line 203, in get_dataset_tensors
File "./detectnet_v2/dataloader/", line 244, in _generate_images_and_ground_truth_labels
File "./detectnet_v2/dataloader/", line 384, in _load_input_tensors
KeyError: 'frame/id'
From what I can tell there are significantly different keys used for the TFRecords produced by the tlt-dataset-convert tool than those I’ve used in the TFRecord creation script that I’ve used to convert my data’s original annotations which are in PASCAL VOC format. My code is based on this code from the TensorFlow object detection models API. If I could access the Python code driving the tlt-dataset-convert tool then I could probably surmount this issue, but as it stands it appears to be squirreled away somewhere inaccessible, at least I’ve not managed to find where it lives in the Docker container.