Different result between tlt-infer and trt engine unet segmentation model

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) Nano
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Unet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.0-py3
• Training spec file(If have, please share here) experiment_spec.txt (16.9 KB)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hello, I am experiencing trouble reproducing results that I see when running tlt-infer on a trained unet model and the same model but exported and converted to a trt engine. The segmentation result with tlt-infer is much more accurate, whereas trt engine produces more errors and masks seem to be off the objects. I have checked many times that I do the preprocessing as required, run on same image etc. The issue persists with both a fp16 and fp32 engine. I attach the result from trt-infer and a mask for lane marking class produced by trt engine, to illustrate the kind of difference. Please let me know if you need extra input from me like models or specs

May I know how did you generate trt engine? Is it using tlt-converter?

yes, I used the tlt-converter on jetson nano

How about the “mask for lane marking class produced by trt engine”? Is it the result of running inference with deepstream?

no, this is a result of a custom script where I use python api of tensorrt to run the engine, and save binary masks of each class in the picture separately

OK, if possible, please try to run inference in Nano with deepstream. You can refer to the spec in deepstream_tlt_apps/pgie_peopleSemSegNet_tlt_config.txt at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub


here’s what i get with this config
config.txt (3.3 KB)

pretty much the same as with custom code, for other classes as well

Please double check the config file when run inference in deepstream.
In deepstream_tlt_apps/pgie_peopleSemSegNet_tlt_config.txt at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub, it is model-color-format=0
Also, seems that the num-detected-classes=12 does not match your training spec.

More, to narrow down the issue, please generate trt engine in the tlt docker instead of nano.
And run tlt unet inference against this “.engine” file or “.trt” file.

  1. i use the valus that were generated by tlt-export with --gen-ds-config option, and it says the model is BGR i.e. color format 1
  2. it does, because there I merge classes and reduce them to 12
  3. Ok I will

generated config file:
net-scale-factor=0.00784313725490196
offsets=127.5;127.5;127.5
infer-dims=3;512;512
tlt-model-key=nvidia_tlt
network-type=2
num-detected-classes=12
model-color-format=1
segmentation-threshold=0.0
output-blob-names=softmax_1
segmentation-output-order=1

for

generate trt engine in the tlt docker

could you please point me to instructions how to do this?

The tlt-converter is a tool in tlt docker by default.
You can run $tlt unet run tlt-converter xxx

Or, you can directly login the docker via $ tlt unet run /bin/bash, and then run tlt-converter xxx


this is the result, seems to be an issue with the converter?

Thanks for the info. I will check further.
Could you please share the command line of generating trt engine inside tlt docker?

tlt unet run tlt-converter -k nvidia_tlt -t fp32 -e unet.engine -p input_1,1x3x512x512,1x3x512x512,10x3x512x512 pruned.etlt

Thanks. Could you please try below experiment?
Please generate trt engine based on the unpruned etlt model. And run inference again.

engine:

tlt-infer:

Thanks for the info. Will check further.

I cannot reproduce with one official purpose-build unet model. Please follow below step to check if you can get the same result as mine.
Then , please double check your previous result. If possible, you can share your tlt model, etlt model and test image with me.

For the model, see NVIDIA NGC

My step:

  1. Download tlt model, etlt model and test image.

$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplesemsegnet/versions/deployable_v1.0/files/peoplesemsegnet.etlt
$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplesemsegnet/versions/trainable_v1.0/files/peoplesemsegnet.tlt
$ wget https://developer.nvidia.com/sites/default/files/akamai/NGC_Images/models/peoplenet/input_11ft45deg_000070.jpg

  1. Generate trt engine
    $ tlt-converter -k tlt_encode -t fp32 -e peoplesemsegnet.engine -p input_1,1x3x544x960,1x3x544x960,10x3x544x960 peoplesemsegnet.etlt

  2. Generate spec.txt

random_seed: 42
model_config {
num_layers: 18
all_projections: true
arch: “vanilla_unet_dynamic”
use_batch_norm: true
training_precision {
backend_floatx: FLOAT32
}
model_input_height: 544
model_input_width: 960
model_input_channels: 3
}

training_config {
batch_size: 2
epochs: 30
log_summary_steps: 10
checkpoint_interval: 1
learning_rate:0.0001
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}

}

dataset_config {
dataset: “custom”
augment: False
augmentation_config {
spatial_augmentation {
hflip_probability : 0.5
vflip_probability : 0.5
crop_and_resize_prob : 0.5
}
brightness_augmentation {
delta: 0.2
}
}
input_image_type: “color”
test_images_path:"/workspace/demo_3.0/unet/test_image"

data_class_config {
target_classes {
name: “person”
label_id: 0
mapping_class: “person”
}
target_classes {
name: “background”
label_id: 1
mapping_class: “background”
}
}
}

  1. Run inference with tlt model.

tlt unet inference -e spec.txt
-m peoplesemsegnet.tlt
-o result_tlt
-k tlt_encode
-v

Result:

  1. Run inference with trt engine.

tlt unet inference -e spec.txt
-m peoplesemsegnet.engine
-o result_trt_engine
-k tlt_encode
-v

Result:

I ran the test you suggested and I also get absolutely identical results for tlt and trt inferences.
Then I retried training the model to be 100% sure I did not miss smth in configs. I stopped the training midway, ran inference and got this

Then exported to etlt, then converted to trt, using the same commands as mentioned above. Then ran inference on trt engine. got this

As you see, the trt result is much closer to tlt, but still noticebly off. Seems like the objects’ masks are somehow skewed to the left, especially on the left lane marking. Probably with more training the difference grows larger, because on my first examples you can see the same effect but to a greater extent.

Since your trial was using vanilla-unet-dynamic and mine uses resnet18, could you please make a trial with a resnet based model to see if that is where the issue lies?

Here I attach
tlt model model.step-9000.tlt — Яндекс.Диск
etlt model model_1.etlt — Яндекс.Диск

training spec experiment_config_18.cfg (19.5 KB)

test image

test image

Thanks, I can reproduce your result with your models. Still checking.

BTW, is your training dataset a public one? If yes, could you share the link? Thanks.