Thanks @Morganh.
Here’s the command with the complete output:
$ tlt-train detectnet_v2 -r output -e detectnet2_resnet18_train.txt -k MY_API_KEY
Using TensorFlow backend.
2019-10-10 14:31:49.783663: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-10 14:31:49.911387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-10 14:31:49.911900: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6d3c2f0 executing computations on platform CUDA. Devices:
2019-10-10 14:31:49.911927: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1050 Ti with Max-Q Design, Compute Capability 6.1
2019-10-10 14:31:49.913621: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-10-10 14:31:49.914307: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6da5fd0 executing computations on platform Host. Devices:
2019-10-10 14:31:49.914331: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2019-10-10 14:31:49.914462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 Ti with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.4175
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.32GiB
2019-10-10 14:31:49.914481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-10 14:31:49.915109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-10 14:31:49.915123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-10-10 14:31:49.915131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-10-10 14:31:49.915202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3099 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-10 14:31:49,916 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at detectnet2_resnet18_train.txt.
2019-10-10 14:31:49,916 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from detectnet2_resnet18_train.txt
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2019-10-10 14:31:49,923 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2019-10-10 14:31:50,040 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 5219 samples with a batch size of 4; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-10 14:31:50,045 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2019-10-10 14:31:50,058 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 384, 1248) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 96, 312) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_3[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 64, 96, 312) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_4[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 96, 312) 0 block_1b_bn_2[0][0]
activation_3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 64, 96, 312) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_6[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 activation_5[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 128, 48, 156) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 48, 156) 147584 activation_7[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 128, 48, 156) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_8[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 48, 156) 0 block_2b_bn_2[0][0]
activation_7[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 128, 48, 156) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_10[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 activation_9[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 256, 24, 78) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 24, 78) 590080 activation_11[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 256, 24, 78) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_12[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 24, 78) 0 block_3b_bn_2[0][0]
activation_11[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 256, 24, 78) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_14[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 activation_13[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 512, 24, 78) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 24, 78) 2359808 activation_15[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 512, 24, 78) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_16[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 24, 78) 0 block_4b_bn_2[0][0]
activation_15[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 512, 24, 78) 0 add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 8, 24, 78) 4104 activation_17[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 2, 24, 78) 1026 activation_17[0][0]
==================================================================================================
Total params: 11,200,458
Trainable params: 11,181,258
Non-trainable params: 19,200
__________________________________________________________________________________________________
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 10, in <module>
sys.exit(main())
File "./common/magnet_train.py", line 37, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/train.py", line 632, in main
File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
File "./detectnet_v2/scripts/train.py", line 466, in train_gridbox
File "./detectnet_v2/scripts/train.py", line 296, in build_training_graph
File "./detectnet_v2/dataloader/default_dataloader.py", line 203, in get_dataset_tensors
File "./detectnet_v2/dataloader/default_dataloader.py", line 244, in _generate_images_and_ground_truth_labels
File "./detectnet_v2/dataloader/default_dataloader.py", line 384, in _load_input_tensors
KeyError: 'frame/id'
From what I can tell there are significantly different keys used for the TFRecords produced by the tlt-dataset-convert tool than those I’ve used in the TFRecord creation script that I’ve used to convert my data’s original annotations which are in PASCAL VOC format. My code is based on this code from the TensorFlow object detection models API. If I could access the Python code driving the tlt-dataset-convert tool then I could probably surmount this issue, but as it stands it appears to be squirreled away somewhere inaccessible, at least I’ve not managed to find where it lives in the Docker container.