tlt-infer ValueError: could not broadcast input array from shape (3,300,224) into shape (3,224,300)

Hi guys,
tlt-infer get an error:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-infer", line 10, in <module>
  File "./common/", line 26, in main
  File "./makenet/scripts/", line 185, in main
  File "./makenet/scripts/", line 159, in inference
  File "./makenet/scripts/", line 93, in load_image_batch
ValueError: could not broadcast input array from shape (3,300,224) into shape (3,224,300)

My command is:

%env EPOCH=080
!tlt-infer classification -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
                          -k $KEY -b 32 -d $DATA_DOWNLOAD_DIR/test/male \
                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

How can I fix it?

Hi ChuongPhung,
What is your input_image_size setting in the classifiction spec file?

My input_image_size setting in the classification spec file is 3, 224, 300

If I use input_image_size is 3,300,300, everything will be ok but if I use the above configuration, I got the error

Hi ChuongPhung,
Could you please paste the full log of running tlt-infer when you get the error?
More, could you please paste the spec file when you train the tlt model(resnet_$EPOCH.tlt)?

Hi ChuongPhung,
There is an issue for non-square setting of input_image_size during “tlt-infer”.
We will fix it. Thanks very much for the finding.

Hello, is there any update to this issue?
I am running into the same error. Also how would we go about getting the update?

Please use the 1.0.1 TLT docker in
It is available since last week.

Hi Morganh,
Thank you for that information.
Also, I wanted to find out how exactly the “input dataloader” crops the input images - how exactly does the resize work?
For example, if the input is 600x400 and 768x384 is what I specified, how would it treat the image because the ratio is off.
Would it crop the image, or would it add padding?

What’s the best practice for images sizing on training data?

Thank you,
Vandan Patel

Hi tech.h4x0rz,
For classification network in TLT, dataloader will resize the input images if their width/height are different from the target size where set in the spec. PIL Interpolation method is used to resize the images.
More, from tlt user guide,
Note: Classification input images do not need to be manually resized. The input dataloader resizes images as needed.