Tensor reshape error when evaluating a Detectnet_v2 model

monocongo · October 25, 2019, 4:13pm

Thanks so much for your continuing help with this, Morganh.

Using the script I attached above I validated my training dataset that I assumed to have all images with 1024x768 resolution (the testing dataset already validated well showing all images at 1024x768 resolution). Surprisingly it turned out that there were multiple images in the training dataset that were not at 1024x768 resolution. For example:

Images found in /home/james/nvidia/tlt/experiments/tfrecords/training/trainval-fold-001-of-002-shard-00001-of-00010 with unexpected dimensions:
Image ID: image_2/016ae9bb1b4cc4be
    Width: 1024
    Height: 758
Image ID: image_2/45b2d5d14b97d6f5
    Width: 1024
    Height: 683
Image ID: image_2/00000901
    Width: 375
    Height: 281
Image ID: image_2/armas_1147
    Width: 620
    Height: 350
Image ID: image_2/armas_1671
    Width: 500
    Height: 375
Image ID: image_2/armas_2876
    Width: 400
    Height: 282
Image ID: image_2/armas_2169
    Width: 160
    Height: 120
 ...

So it looks like at some point I managed to use unresized images and corresponding KITTI files to create my TFRecords for input. This escaped my attention I guess because my understanding was that the model won’t train with 1) non-uniform inputs 2) not at a resolution with both width and height being multiples of 16.

I have regenerated the training dataset using images and KITTI files correctly sized to 1024x768. After training the model using this dataset I can now evaluate the model using tlt-evaluate and the reshape issue I was seeing has disappeared.

Can anyone comment as to why the model seems to have initially trained OK with input images at a resolution other than what is specified in the documentation:

DetectNet_v2

Input size: C * W * H (where C = 1 or 3, W > =480, H >=272 and W,H are mutliples 16)
Image format: JPG, JPEG, PNG
Label format: KITTI detection

Note: The tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

In any event, this issue is resolved, it’s just not clear yet as to why it happened in the first place if the non-uniform/non-compliant sizing of the input images was in fact the root cause of the error.

Topic		Replies	Views
Error with Evaluation of trained model TAO Toolkit	3	886	October 12, 2021
Error when retraining dashcamnet TAO Toolkit	8	1029	October 12, 2021
Peoplenet unpruned model evaluation TAO Toolkit	8	790	October 12, 2021
tlt-train error when deploy mobilenet_v2 by using DetectNet TAO Toolkit	28	2682	October 12, 2021
Tensor reshape error when evaluating TrafficCamNet TAO Toolkit tensorflow	14	1183	August 20, 2023
Tao detectnet_v2 train failed with g_error_metadata.to_exception in autograph module TAO Toolkit tao	12	1502	January 10, 2022
Invalid argument: Invalid JPEG data or crop window, data size 786432 TAO Toolkit	9	1486	March 20, 2023
Error on tlt-training detectnet_v2? TAO Toolkit	6	571	October 12, 2021
Unable to train SSD-Resnet-18 TAO Toolkit	16	2148	October 12, 2021
Training detectnet_v2 Issue TAO Toolkit	15	2028	October 12, 2021

Tensor reshape error when evaluating a Detectnet_v2 model

Note: The tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

Related topics