Training multi-class UNet does not converge

Yes, it can.
See one user case. Different result between tlt-infer and trt engine unet segmentation model