Training multi-class UNet does not converge

Thanks for the finding. I will double check the AMP.
More, the latest version 21.08(nvcr.io/nvidia/tao/tao-toolkit-tf/v3.21.08-py3) did a change for resizing and visualization in inference. So, suggest to use it if you have time.

So I still think that the NaN problem was solved by disabling AMP. However, I’m still unable to get very good results. I’m now using the new TAO docker image and the following spec file.
model.txt (2.8 KB)

After 120 epochs, the inference result (colorized with my own colormap) looks like this:

As a comparison, the ground truth with the same colormap looks like this:

The loss looks like this:
tao_loss

While the loss is not diverging (like it did with AMP), it’s not really converging either. What we see here is the result after 210000 steps which took about 24 hours to train. I also tried some intermediate tlt files and they didn’t look any better.

Are there any publicly available multi-class datasets that are shown to work well with UNet? The other topic used Mapillary Vistas to get reasonable results, but the masks in Mapillary Vistas require some amount of processing before they can be given to UNet. So I’m wondering if there is a dataset that would work out-of-the-box.

Thanks for the result. Could you explain more about your finding for “the masks in Mapillary Vistas require some amount of processing before they can be given to UNet”?

I thought the masks were 3-channel images, but now when I took another look, they are actually single-channel 8-bit images which should work directly with UNet. For some reason, the mapping between class names and class numbers is done via the 3-channel colormap values that are also included in the png files. That’s a bit confusing, but should not affect training with UNet. So I’m going to see how the training works with Mapillary Vistas.

I tried Mapillary Vistas but it didn’t change anything. The spec file is the same as in the other topic, except for the file paths:
model.txt (19.4 KB)

I launched the docker container like this:

docker run -it --gpus all \
  -v /path/to/mapillary_vistas_2:/mapillary_vistas_2:ro \
  -v /path/to/my_folder:/workspace/my_folder \
  nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.08-py3

I downloaded the pretrained model:

ngc registry model download-version nvidia/tao/pretrained_semantic_segmentation:resnet18

Then I trained the model like this:

unet train \
  -e /workspace/my_folder/model.txt \
  -m /workspace/pretrained_semantic_segmentation_vresnet18/resnet_18.hdf5 \
  -r /workspace/my_folder/output \
  -k my_key

This is the training loss:
mv2_loss

And then I run inference like this:

unet inference \
  -e /workspace/my_folder/model.txt \
  -m /workspace/my_folder/output/weights/model.tlt \
  -o /workspace/my_folder/infer \
  -k my_key

This is the result (I resized the image before uploading it):

I have also tried another computer (GTX 1080 with 465.19.01 drivers). I have also tried to rename all files so that the filenames consist of sequential numbers with leading zeros, but that also didn’t change anything. I don’t know what to try next.

Not sure what is happening. I’m asking more info from other topics.

Please try again with previous 3.0-py3 docker.
But please note that in that version, see Open Model Architectures — Transfer Learning Toolkit 3.0 documentation

The train tool does not support training on images of multiple resolutions. All of the images and masks must be of equal size. However, image and masks need not be necessarily equal to model input size. The images/ masks will be resized to the model input size during training.

You need to resize the images or labels to be of equal size.
And please note that the model input size should be multiples of 32.

All above assumptions were already satisfied with my first experiments using the BDD100k dataset. However, I did the necessary resizing for Mapillary Vistas and tried it again. The results didn’t improve at all.

This is my Python script that resizes images and labels. Instead of padding, I simply stretch them to be 512x512. This shouldn’t be a major issue, because all images are already somewhat close to square.

#!/usr/bin/env python3

import os
import cv2 as cv
import numpy as np
from PIL import Image

TRAIN_IMAGE_DIR = '/path/to/mapillary_vistas_2/training/images'
TRAIN_LABEL_DIR = '/path/to/mapillary_vistas_2/training/v2.0/labels'
VAL_IMAGE_DIR = '/path/to/mapillary_vistas_2/validation/images'
VAL_LABEL_DIR = '/path/to/mapillary_vistas_2/validation/v2.0/labels'

image_dirs = [TRAIN_IMAGE_DIR, VAL_IMAGE_DIR]
label_dirs = [TRAIN_LABEL_DIR, VAL_LABEL_DIR]
target_size = (512, 512)
dir_suffix = '_{}x{}'.format(*target_size)

def resize_image(filename_old, filename_new):
    image = cv.imread(filename_old, cv.IMREAD_UNCHANGED)
    image = cv.resize(image, dsize=target_size, interpolation=cv.INTER_AREA)
    cv.imwrite(filename_new, image)

def resize_label(filename_old, filename_new):
    image = np.array(Image.open(filename_old).convert('P'))
    image = cv.resize(image, dsize=target_size, interpolation=cv.INTER_NEAREST)
    cv.imwrite(filename_new, image)

def process_dir(dirname_old, resize_fun):
    dirname_new = os.path.normpath(dirname_old) + dir_suffix
    if os.path.exists(dirname_new):
        print('{} already exists, skipping'.format(dirname_new))
        return
    os.mkdir(dirname_new)
    files = os.listdir(dirname_old)
    for f in files:
        filename_old = os.path.join(dirname_old, f)
        filename_new = os.path.join(dirname_new, f)
        resize_fun(filename_old, filename_new)

for d in image_dirs:
    process_dir(d, resize_image)

for d in label_dirs:
    process_dir(d, resize_label)

Here is an example label:

The spec file is the same as before, except that file paths adjusted for the resized images and labels:
model.txt (19.5 KB)

All commands are the same as in the previous post, except I used the TLT docker image instead of TAO. Also, “my_folder” is a new empty folder.

This is the training loss:
losses

And this is the inference mask corresponding to the label above. In fact, in this case the inference contains only one value, namely 11:

We can probably conclude that Volta architecture is not supported by UNet. I have tested with V100 and GTX 1080. I don’t have access to non-Volta GPUs right now, but the few others that have been able to successfully use UNet seem to be using other GPU architectures than Volta.

So, do you mean you can train/inference well with GTX 1080 ?

No I can’t. Sorry I was thinking that GTX 1080 was using Volta, but it’s based on Pascal which is even older. So my hypothesis is that UNet requires hardware which is newer than Volta. Of course that’s just a guess, but I don’t know what else could explain the situation.

No, it does not related to the Volta. For training Mapillary Vistas, you can refer to Different result between tlt-infer and trt engine unet segmentation model

Yes, in that topic he/she is using Nano, which according to Wikipedia was released in 2019, which means it’s newer than V100 or GTX 1080.

Well at least for inference. Not sure what was used for training.

Not correct. He just generated trt engine and run inference in Nano. Does not mention which dgpu is used to train. I will ask him.

More, Nano is not workable for training due to low compute capability. Please see hardware requirement in TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation

More, Nano has lower compute capability than V100 or GTX 1080. Please refer to CUDA GPUs - Compute Capability | NVIDIA Developer

I saw your post in another topic: Problems encountered in training unet and inference unet - #27 by Morganh

I did the png conversion and I also adjusted the regularization weight and crop_and_resize_prob parameter as you suggested. Otherwise I had the same settings and same Mapillary Vistas dataset as above in my previous experiment. Now I got the NaN error during the first epoch, even though AMP was not enabled.

Please do not run evaluation against 1st epoch’s tlt or intermediate tlt . Let the training go further and loss reduce to low. After training, run inference to check if the detection is expected. Then run evaluation.

It is not possible to continue training after the NaN error occurs. I didn’t run any evaluation or inference.

I did the png conversion but skipped the regularization weight and data augmentation adjustments, and now it actually works with Mapillary Vistas. For completeness, I will list all necessary steps in this message, although most of them are copied from my previous messages.

  1. Image and label resizing using Python:
#!/usr/bin/env python3

import os
import cv2 as cv
import numpy as np
from PIL import Image

TRAIN_IMAGE_DIR = '/path/to/mapillary_vistas_2/training/images'
TRAIN_LABEL_DIR = '/path/to/mapillary_vistas_2/training/v2.0/labels'
VAL_IMAGE_DIR = '/path/to/mapillary_vistas_2/validation/images'
VAL_LABEL_DIR = '/path/to/mapillary_vistas_2/validation/v2.0/labels'

image_dirs = [TRAIN_IMAGE_DIR, VAL_IMAGE_DIR]
label_dirs = [TRAIN_LABEL_DIR, VAL_LABEL_DIR]
target_size = (512, 512)
dir_suffix = '_{}x{}'.format(*target_size)

def resize_image(filename_old, filename_new):
    image = cv.imread(filename_old, cv.IMREAD_UNCHANGED)
    image = cv.resize(image, dsize=target_size, interpolation=cv.INTER_AREA)
    cv.imwrite(filename_new, image)

def resize_label(filename_old, filename_new):
    image = np.array(Image.open(filename_old).convert('P'))
    image = cv.resize(image, dsize=target_size, interpolation=cv.INTER_NEAREST)
    cv.imwrite(filename_new, image)

def process_dir(dirname_old, resize_fun):
    dirname_new = os.path.normpath(dirname_old) + dir_suffix
    if os.path.exists(dirname_new):
        print('{} already exists, skipping'.format(dirname_new))
        return
    os.mkdir(dirname_new)
    files = os.listdir(dirname_old)
    for f in files:
        filename_old = os.path.join(dirname_old, f)
        filename_new = os.path.join(dirname_new, f)
        resize_fun(filename_old, filename_new)

for d in image_dirs:
    process_dir(d, resize_image)

for d in label_dirs:
    process_dir(d, resize_label)
  1. Jpg-to-png conversion (this could probably be easily done by modifying the Python script above, but since I had already run it, I used convert instead):
mkdir /path/to/mapillary_vistas_2/training/images_512x512_png
for f in /path/to/mapillary_vistas_2/training/images_512x512/*.jpg
do
    convert "$f" /path/to/mapillary_vistas_2/training/images_512x512_png/$(basename "$f" .jpg).png
done

mkdir /path/to/mapillary_vistas_2/validation/images_512x512_png
for f in /path/to/mapillary_vistas_2/validation/images_512x512/*.jpg
do
    convert "$f" /path/to/mapillary_vistas_2/validation/images_512x512_png/$(basename "$f" .jpg).png
done
  1. Docker launch:
docker run -it --gpus all \
  -v /path/to/mapillary_vistas_2:/mapillary_vistas_2:ro \
  -v /path/to/my_folder:/workspace/my_folder \
  nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3
  1. Model download (note that this is using TAO although the docker image is TLT):
ngc registry model download-version nvidia/tao/pretrained_semantic_segmentation:resnet18
  1. Spec file and training command:
    model.txt (19.5 KB)
unet train \
  -e /workspace/my_folder/model.txt \
  -m /workspace/pretrained_semantic_segmentation_vresnet18/resnet_18.hdf5 \
  -r /workspace/my_folder/output \
  -k my_key

This is the training loss:
losses

And this is the inference result:

Things to note:

  • Automatic mixed precision was not used
  • All images were png
  • All image aspect ratios matched the network input aspect ratio
  • All images actually had the same size as the network input, but I assume this is not necessary

Also, when resizing labels, it is important to make sure that the output is in single-channel format, and that the interpolation method is chosen such that no spurious labels are introduced. The above Python scripts does the resizing properly.

I won’t mark this as solved yet, because I want to train the network using the BDD100K dataset which was my original goal. I also want to run inference with different image sizes. Next, I will investigate whether I can achieve these goals.

I tried the BDD100K dataset with the new TAO image and it works, as long as the images are converted to png. No resizing was needed, and the labels worked out-of-the-box. Here are my final conclusions regarding UNet:

  • Automatic mixed precision must be disabled. Otherwise, the loss function will become NaN and the training will terminate prematurely.
  • All input images must be PNG. Otherwise, the loss function value will not decrease and the inference results will be very poor.
  • It is recommended to use the new TAO docker image. Otherwise, some non-trivial manual resizing needs to be done for the images and labels.
  • The training is extremely sensitive to the spec file values. For example, changing the regularization weight to 2e-06 and setting crop_and_resize_prob to 0.01 will cause the training to fail and terminate prematurely. An example spec file that works can be found from my previous message.

Thanks for the info…
But for “changing the regularization weight to 2e-06 and setting crop_and_resize_prob to 0.01 will cause the training to fail and terminate prematurely” , this should be case by case. If set regularization weight to lower, then the weight for dice_loss and crossentropy_loss will be higher.
Anyway, end user can finetune these parameters.