Tensor reshape error when evaluating a Detectnet_v2 model

I am attempting to evaluate my Detectnet_v2 model after training using the instructions found here: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#evaluating_gridbox

I am getting this error:

ValueError: Cannot reshape a tensor with 98304 elements to shape [4,2,4,24,78] (59904 elements) for 'reshape_1_1/Reshape' (op: 'Reshape') with input shapes: [4,8,48,64], [5] and with input tensors computed as partial shapes: input[1] = [4,2,4,24,78].

For evaluation, I have used KITTI data that has the same shape as the training/validation data used for training. I have converted this data into TFRecords using the tool tlt-dataset-convert.

Below is the command I’ve used and the resulting stack trace:

tlt-evaluate detectnet_v2 -e specs/detectnet2_resnet18_test.txt -m output/weights/model.tlt -k ${NGC_API_KEY}
Using TensorFlow backend.
2019-10-21 21:50:37,119 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from specs/detectnet2_resnet18_test.txt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-21 21:50:37,412 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-21 21:50:38.295959: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-21 21:50:38.395926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-21 21:50:38.396472: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f47550 executing computations on platform CUDA. Devices:
2019-10-21 21:50:38.396491: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
2019-10-21 21:50:38.418439: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2019-10-21 21:50:38.418891: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7fb02a0 executing computations on platform Host. Devices:
2019-10-21 21:50:38.418910: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-21 21:50:38.419178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:01:00.0
totalMemory: 5.77GiB freeMemory: 515.75MiB
2019-10-21 21:50:38.419195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-21 21:50:38.419929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-21 21:50:38.419939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-21 21:50:38.419945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-21 21:50:38.420017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 290 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
2019-10-21 21:50:39,922 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2019-10-21 21:50:41,177 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/evaluation/build_evaluator.pyc: Found 579 samples in validation set
Traceback (most recent call last):
  File "/usr/local/bin/tlt-evaluate", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_evaluate.py", line 38, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/evaluate.py", line 119, in main
  File "./detectnet_v2/evaluation/build_evaluator.py", line 124, in build_evaluator_for_trained_gridbox
  File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
  File "./detectnet_v2/model/detectnet_model.py", line 617, in build_validation_graph
  File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
  File "./detectnet_v2/model/detectnet_model.py", line 582, in build_inference_graph
  File "./detectnet_v2/model/detectnet_model.py", line 243, in predictions_to_dict
  File "./detectnet_v2/objectives/base_objective.py", line 97, in reshape_output
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 401, in call
    return K.reshape(inputs, (K.shape(inputs)[0],) + self.target_shape)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1969, in reshape
    return tf.reshape(x, shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 7179, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1823, in __init__
    control_input_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1662, in _create_c_op
    raise ValueError(str(e))
ValueError: Cannot reshape a tensor with 98304 elements to shape [4,2,4,24,78] (59904 elements) for 'reshape_1_1/Reshape' (op: 'Reshape') with input shapes: [4,8,48,64], [5] and with input tensors computed as partial shapes: input[1] = [4,2,4,24,78].

Can anyone comment as to what may be my issue?

Thanks in advance for any suggestions or insight.

Could you please attach the spec? Thanks.

Thanks, Morgan.

Here is the spec for evaluation (the same as the training spec with an additional dataset_config.validation_data_source entry), specs/detectnet2_resnet18_test.txt:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/experiments/tfrecords/trainval*"
    image_directory_path: "/workspace/experiments/kitti/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "handgun"
    value: "handgun"
  }
  target_class_mapping {
    key: "rifle"
    value: "rifle"
  }
  validation_fold: 0
  validation_data_source: {
    tfrecords_path: "/workspace/experiments/tfrecords/testing*"
    image_directory_path: "/workspace/experiments/kitti/testing"
  }
}
augmentation_config {
  preprocessing {
    output_image_width: 1024
    output_image_height: 768
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "handgun"
    value {
      clustering_config {
        coverage_threshold: 0.005
        dbscan_eps: 0.2
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "rifle"
    value {

      clustering_config {
        coverage_threshold: 0.005
        dbscan_eps: 0.15
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/experiments/saved_output/20191017/weights/model.tlt"
  num_layers: 18
  freeze_blocks: 0
  use_batch_norm: true
  arch: "resnet"
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "handgun"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "rifle"
    value: 0.5
  }
  evaluation_box_config {
    key: "handgun"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "rifle"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "handgun"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "rifle"
    class_weight: 8.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 240
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.1
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3.0e-09
  }
  optimizer {
    adam {
      epsilon: 9.9e-09
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "handgun"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "rifle"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.4
}

Hi monocongo,
Could you please double check your test dataset?
From your training spec, the training size is 1024x768. If use the tlt model to evaluate 1248x384 test dataset, that must the the culprit.

Note: The training dimension is 1024x768, so the output size is 64x48.
In evaluation, it tries to reshape to this shape.
But your actual tensor shape is 78x24, multiplied by 16, the corresponding input shape is 1248x384

For how to calculate the output size: 1024/16 is 64, and 768/16 is 48

Thanks, Morgan.

I have double-checked my testing dataset and all the images are 1024x768. So perhaps the conversion from KITTI to TFRecords has somehow reshaped to 1248x384? Below is the spec file I used for the conversion from KITTI to TFRecords with tlt-dataset-convert. Nothing appears to be amiss, right?

kitti_config {
  root_directory_path: "/workspace/experiments/kitti/testing"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".jpg"
  partition_mode: "random"
  num_partitions: 2
  val_split: 0
  num_shards: 10
}

Can you recommend a visualization tool or some other means of inspecting the testing dataset TFRecords to confirm that they are in fact using the correct resolution of 1024x768?

Hi monocongo,
Please use below python script to check width/height of your test dataset’s tfrecord generated by tlt-dataset-conver .
Simply change line 5 to your actual tfrecord file.

from __future__ import print_function

import tensorflow as tf

tf_file = './pascal_voc_07-fold-000-of-002-shard-00000-of-00010'  # change to your actual tfrecord file

def extract_tfrecords_features(tfrecords_file):
    """Extract features in a tfrecords file for parsing a series of tfrecords files."""
    tfrecords_iterator = tf.python_io.tf_record_iterator(tfrecords_file)

    for record in tfrecords_iterator:

            example = tf.train.Example()
            example.ParseFromString(record)

            features = example.features.feature
            #print("the features is:{}\n".format(features))

            frame_id = features['frame/id'].bytes_list.value
            print("The frame_id is:{}\n".format(frame_id))

            frame_width = features['frame/width'].int64_list.value
            print("The frame_width is:{}\n".format(frame_width))

            frame_height = features['frame/height'].int64_list.value
            print("The frame_height is:{}\n".format(frame_height))

if __name__ == '__main__':
    extract_tfrecords_features(tf_file)

Thanks for your help, Morganh.

I have updated the code (attached below) and have run it on all TFRecord files in my testing dataset. All files appear to be correctly sized, i.e. with 1024x768 resolution (WxH). Can you recommend anything else I might try to get to the bottom of this?

"""
Example usage:

python3 inspect_tfrecords.py \
    --tfrecords_dir /home/james/nvidia/tlt/experiments/tfrecords/testing \
    --expected_width 1024 \
    --expected_height 768
"""

import argparse
import os

import tensorflow as tf


# ------------------------------------------------------------------------------
def validate_image_features(
        tfrecords_file: str,
        expected_width: int,
        expected_height: int,
) -> dict:
    """
    Report any image features in a TFRecord file that don't match with expected
    image dimensions.

    :param tfrecords_file: TFRecord file path
    :param expected_width: expected width of all images in the TFRecord file
    :param expected_height: expected height of all images in the TFRecord file
    :return: a dictionary with details of all images that don't contain the
        expected width or height, with image IDs as keys mapped to a dictionary
        value with width and height entries
    """
    unexpected_details = {}
    tfrecords_iterator = tf.io.tf_record_iterator(tfrecords_file)
    for record in tfrecords_iterator:

        example = tf.train.Example()
        example.ParseFromString(record)
        features = example.features.feature

        frame_id = features['frame/id'].bytes_list.value.__str__()[3:-2]
        frame_width = int(features['frame/width'].int64_list.value.__str__()[1:-1])
        frame_height = int(features['frame/height'].int64_list.value.__str__()[1:-1])

        if (frame_width != expected_width) or (frame_height != expected_height):
            dimensions = {
                "width": frame_width,
                "height": frame_height,
            }
            unexpected_details[frame_id] = dimensions

    return unexpected_details


# ------------------------------------------------------------------------------
if __name__ == '__main__':

    # parse the command line arguments
    args_parser = argparse.ArgumentParser()
    args_parser.add_argument(
        "--tfrecords_dir",
        required=True,
        type=str,
        help="TFRecord file to be inspected",
    )
    args_parser.add_argument(
        "--expected_width",
        required=True,
        type=int,
        help="Expected width of images",
    )
    args_parser.add_argument(
        "--expected_height",
        required=True,
        type=int,
        help="Expected height of images",
    )
    args = vars(args_parser.parse_args())

    # loop over each TFRecord file in the specified dataset directory
    tfrecord_files = os.listdir(args["tfrecords_dir"])
    for tfrecord_file in tfrecord_files:

        # find any images with unexpected dimensions
        tfrecord_file_path = os.path.join(args["tfrecords_dir"], tfrecord_file)
        unexpected = validate_image_features(tfrecord_file_path, args["expected_width"], args["expected_height"])

        # display results for the current TFRecord file
        if len(unexpected) > 0:
            print(f"Images found in {tfrecord_file_path} with unexpected dimensions:")
            for k, v in unexpected.items():
                print(f"Image ID: {k}\n\tWidth: {v['width']}\n\tHeight: {v['height']}")
        else:
            print(f"All images found in {tfrecord_file_path} have expected dimensions")

    # successful completion
    exit(0)

I also find that you set val_split to 0. Is it any reason here? The supported value mentioned in tlt is 1-100.

Also, see tlt doc 7.2 section,need to update dataloader configuration part of the training spec file.
Seems that yours has no changing.

validation_data_source: {
    tfrecords_path: "/workspace/experiments/tfrecords/testing*"
    image_directory_path: "/workspace/experiments/kitti/testing"

Yes, when I created the testing dataset my assumption was that it would be used as a whole so I did not make a validation split. My thinking was that when training happens the training portion of the “trainval” dataset is used for training and the validation portion of the “trainval” dataset is used for validation, i.e. mAP calculation, etc. For evaluation, I assumed that the entire testing dataset will be run through the model for evaluation of the trained model’s inferencing performance/accuracy. For this reason, I thought that there’d be no purpose to a validation split for the testing dataset, hence the val_split setting of 0.

Am I off-base in my understanding of this process? Please explain how this works if you can (I’m new to this and the code is not open source so I can’t easily figure this out on my own). Thanks!

Thanks, Morganh. I’m not sure I follow you – my spec file for evaluation is exactly the same as the spec file for training, only with an additional validation_data_source entry, as per the instructions in section 7.2 of the documentation.

In other words, I have a training spec file that has no validation_data_source entry, and this is the spec file that’s used for training. I have a separate evaluation spec file that is an exact duplicate of the training spec file but with an additional validation_data_source entry that references the training dataset, as instructed by the documentation in section 7.2. This is the spec file I use in my evaluation command. The testing dataset has been validated, i.e. shown to have the same resolution as the training dataset (1024x768) but still the reshaping issue is occurring.

Have I missed something in section 7.2 that should be different from the situation described above? Because it seems that I’ve correctly followed the instructions there, please advise if not.

Hi monocongo
Could you please check your training log to make sure that “output/weights/model.tlt” is trained via 1024x768 ?
Is it possible you have trained one tlt model previously which is not exactly 1024x768?

Hi monocongo,
I setup one experiment and find the reason now. Please make sure the output_image_height/width are the same as the height/width of the network input.

== my experiment ==
If I change my spec which is used in tlt-evaluate, I can reproduce similar error as yours.
The output image width/height 1024x768 does not match input shape(512x512) of my tlt model.

augmentation_config {
  preprocessing {
    output_image_width: 1024
    output_image_height: 768

==my log ==
ValueError: Cannot reshape a tensor with 3932160 elements to shape [16,20,4,32,32] (1310720 elements) for ‘reshape_1_1/Reshape’ (op: ‘Reshape’) with input shapes: [16,80,48,64], [5] and with input tensors computed as partial shapes: input[1] = [16,20,4,32,32].

==your log==
ValueError: Cannot reshape a tensor with 98304 elements to shape [4,2,4,24,78] (59904 elements) for ‘reshape_1_1/Reshape’ (op: ‘Reshape’) with input shapes: [4,8,48,64], [5] and with input tensors computed as partial shapes: input[1] = [4,2,4,24,78].

Thanks so much for your continuing help with this, Morganh.

Using the script I attached above I validated my training dataset that I assumed to have all images with 1024x768 resolution (the testing dataset already validated well showing all images at 1024x768 resolution). Surprisingly it turned out that there were multiple images in the training dataset that were not at 1024x768 resolution. For example:

Images found in /home/james/nvidia/tlt/experiments/tfrecords/training/trainval-fold-001-of-002-shard-00001-of-00010 with unexpected dimensions:
Image ID: image_2/016ae9bb1b4cc4be
    Width: 1024
    Height: 758
Image ID: image_2/45b2d5d14b97d6f5
    Width: 1024
    Height: 683
Image ID: image_2/00000901
    Width: 375
    Height: 281
Image ID: image_2/armas_1147
    Width: 620
    Height: 350
Image ID: image_2/armas_1671
    Width: 500
    Height: 375
Image ID: image_2/armas_2876
    Width: 400
    Height: 282
Image ID: image_2/armas_2169
    Width: 160
    Height: 120
 ...

So it looks like at some point I managed to use unresized images and corresponding KITTI files to create my TFRecords for input. This escaped my attention I guess because my understanding was that the model won’t train with 1) non-uniform inputs 2) not at a resolution with both width and height being multiples of 16.

I have regenerated the training dataset using images and KITTI files correctly sized to 1024x768. After training the model using this dataset I can now evaluate the model using tlt-evaluate and the reshape issue I was seeing has disappeared.

Can anyone comment as to why the model seems to have initially trained OK with input images at a resolution other than what is specified in the documentation:


DetectNet_v2

Input size: C * W * H (where C = 1 or 3, W > =480, H >=272 and W,H are mutliples 16)
Image format: JPG, JPEG, PNG
Label format: KITTI detection

Note: The tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

In any event, this issue is resolved, it’s just not clear yet as to why it happened in the first place if the non-uniform/non-compliant sizing of the input images was in fact the root cause of the error.

Hi monocongo,
Glad to know you fix the issue!
During training, there is a crop step to crop them into the same size.If original image is smaller than model input size, then crop will become padding.
See 5.2.6 section for more info.

Hi Morganh
I am having the same problem when retraining with the kitti dataset.
The images of the kitti dataset have size variation of few pixels. I could train detectnet_v2 without problem but I could not train the model after pruning.
Could you clarify, if the preprocessing module does not work when retraining?
I also tried to use “enable_auto_resize” but the following error came up.

google.protobuf.text_format.ParseError: 37:5 : Message type "AugmentationConfig.Preprocessing" has no field named "enable_auto_resize".
Traceback (most recent call last):

using TLT 3.0 and the unpruned detectnet model is tlt_peoplenet:unpruned_v2.1

@luis.marval
Please create a new topic in TLT forum. Thanks.