Error while using vanilla_unet on tlt 3.0


I am using the below specs for training unet in binary segmentation task.

random_seed: 42
model_config {
model_input_width: 1024
model_input_height: 1280
model_input_channels: 3
arch: “vanilla_unet”
use_batch_norm: true
training_precision {
backend_floatx: FLOAT32

training_config {
batch_size: 16
epochs: 10
log_summary_steps: 10
checkpoint_interval: 1
loss: “dice”
regularizer {
type: L2
weight: 3.00000002618e-09
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875

dataset_config {

dataset: “custom”
augment: True
input_image_type: “color”



data_class_config {
target_classes {
name: “foreground”
mapping_class: “foreground”
label_id: 0
target_classes {
name: “background”
mapping_class: “background”
label_id: 1


I am getting the following error :

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 256, 280, 216), (None, 256, 281, 217)]
Traceback (most recent call last):
** File “/usr/local/bin/unet”, line 8, in **
** sys.exit(main())**
** File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/entrypoint/”, line 12, in main**
** File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/”, line 296, in launch_job**
AssertionError: Process run failed.
2021-04-19 00:29:40,037 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

input image dimensions 1024 and 1280 are multiples of 16 which is the requirement for unet traning.
I don’t know where I was wrong. I even height and width as 320. It didn’t even work.

Would anyone please help me with this issue?


See Open Model Architectures — Transfer Learning Toolkit 3.0 documentation,
can you check if all of the images and masks be of the equal size?


  • Input size : C * W * H (where C = 3, W > =128, H >=128 and W, H are multiples of 32)
  • Image format : JPG, JPEG, PNG, BMP
  • Label format : Image/Mask pair


The train tool does not support training on images of multiple resolutions. All of the images and masks must be of equal size. However, image and masks need not be necessarily equal to model input size. The images/ masks will be resized to the model input size during training.

I checked the input size. All W, H values are the same. However if the change “arch” parameter to vgg and resenet, it seems to work fine. I don’t know what is wrong with vanilla_unet.

To narrow down, if possible, could you try to run the default jupyter notebook with vanilla_unet?

Hi @vgsharsha20 ,
Sorry for that. For vanilla_unet, the model_input_width/ model_input_height should be 572.
Vanilla unet supports only 572.
We will improve the use guide for this requirement.