Error while using vanilla_unet on tlt 3.0

vgsharsha20 · April 19, 2021, 4:41am

Hi,

I am using the below specs for training unet in binary segmentation task.

random_seed: 42
model_config {
model_input_width: 1024
model_input_height: 1280
model_input_channels: 3
arch: “vanilla_unet”
use_batch_norm: true
training_precision {
backend_floatx: FLOAT32
}
}

training_config {
batch_size: 16
epochs: 10
log_summary_steps: 10
checkpoint_interval: 1
loss: “dice”
learning_rate:0.0001
regularizer {
type: L2
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
}

dataset_config {

dataset: “custom”
augment: True
input_image_type: “color”
train_images_path:“/workspace/tlt-experiments/data/endoData/images/train”
train_masks_path:“/workspace/tlt-experiments/data/endoData/masks/train”

val_images_path:“/workspace/tlt-experiments/data/endoData/images/val”
val_masks_path:“/workspace/tlt-experiments/data/endoData/masks/val”

test_images_path:“/workspace/tlt-experiments/data/endoData/images/test”

data_class_config {
target_classes {
name: “foreground”
mapping_class: “foreground”
label_id: 0
}
target_classes {
name: “background”
mapping_class: “background”
label_id: 1
}
}

}

I am getting the following error :

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 256, 280, 216), (None, 256, 281, 217)]
Traceback (most recent call last):
** File “/usr/local/bin/unet”, line 8, in **
** sys.exit(main())**
** File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/entrypoint/unet.py”, line 12, in main**
** File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job**
AssertionError: Process run failed.
2021-04-19 00:29:40,037 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

input image dimensions 1024 and 1280 are multiples of 16 which is the requirement for unet traning.
I don’t know where I was wrong. I even height and width as 320. It didn’t even work.

Would anyone please help me with this issue?

Thanks,
Harsha.

Morganh · April 19, 2021, 7:51am

See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/open_model_architectures.html#unet,
can you check if all of the images and masks be of the equal size?

UNet

Input size : C * W * H (where C = 3, W > =128, H >=128 and W, H are multiples of 32)

Image format : JPG, JPEG, PNG, BMP

Label format : Image/Mask pair

Note

The train tool does not support training on images of multiple resolutions. All of the images and masks must be of equal size. However, image and masks need not be necessarily equal to model input size. The images/ masks will be resized to the model input size during training.

vgsharsha20 · April 20, 2021, 2:48am

I checked the input size. All W, H values are the same. However if the change “arch” parameter to vgg and resenet, it seems to work fine. I don’t know what is wrong with vanilla_unet.

Morganh · April 20, 2021, 6:28am

To narrow down, if possible, could you try to run the default jupyter notebook with vanilla_unet?

Morganh · April 22, 2021, 1:27am

Hi @vgsharsha20 ,
Sorry for that. For vanilla_unet, the model_input_width/ model_input_height should be 572.
Vanilla unet supports only 572.
We will improve the use guide for this requirement.