Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) Nano
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Unet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v3.0-py3
• Training spec file(If have, please share here) experiment_spec.txt (16.9 KB)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Hello, I am experiencing trouble reproducing results that I see when running tlt-infer on a trained unet model and the same model but exported and converted to a trt engine. The segmentation result with tlt-infer is much more accurate, whereas trt engine produces more errors and masks seem to be off the objects. I have checked many times that I do the preprocessing as required, run on same image etc. The issue persists with both a fp16 and fp32 engine. I attach the result from trt-infer and a mask for lane marking class produced by trt engine, to illustrate the kind of difference. Please let me know if you need extra input from me like models or specs
May I know how did you generate trt engine? Is it using tlt-converter?
yes, I used the tlt-converter on jetson nano
How about the “mask for lane marking class produced by trt engine”? Is it the result of running inference with deepstream?
no, this is a result of a custom script where I use python api of tensorrt to run the engine, and save binary masks of each class in the picture separately
here’s what i get with this config
pretty much the same as with custom code, for other classes as well
Please double check the config file when run inference in deepstream.
In deepstream_tlt_apps/pgie_peopleSemSegNet_tlt_config.txt at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub, it is
Also, seems that the
num-detected-classes=12 does not match your training spec.
More, to narrow down the issue, please generate trt engine in the tlt docker instead of nano.
tlt unet inference against this “.engine” file or “.trt” file.
- i use the valus that were generated by tlt-export with --gen-ds-config option, and it says the model is BGR i.e. color format 1
- it does, because there I merge classes and reduce them to 12
- Ok I will
generated config file:
generate trt engine in the tlt docker
could you please point me to instructions how to do this?
The tlt-converter is a tool in tlt docker by default.
You can run
$tlt unet run tlt-converter xxx
Or, you can directly login the docker via
$ tlt unet run /bin/bash, and then run
this is the result, seems to be an issue with the converter?
Thanks for the info. I will check further.
Could you please share the command line of generating trt engine inside tlt docker?
tlt unet run tlt-converter -k nvidia_tlt -t fp32 -e unet.engine -p input_1,1x3x512x512,1x3x512x512,10x3x512x512 pruned.etlt
Thanks. Could you please try below experiment?
Please generate trt engine based on the unpruned etlt model. And run inference again.
Thanks for the info. Will check further.
I cannot reproduce with one official purpose-build unet model. Please follow below step to check if you can get the same result as mine.
Then , please double check your previous result. If possible, you can share your tlt model, etlt model and test image with me.
For the model, see NVIDIA NGC
- Download tlt model, etlt model and test image.
$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplesemsegnet/versions/deployable_v1.0/files/peoplesemsegnet.etlt
$ wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplesemsegnet/versions/trainable_v1.0/files/peoplesemsegnet.tlt
$ wget https://developer.nvidia.com/sites/default/files/akamai/NGC_Images/models/peoplenet/input_11ft45deg_000070.jpg
Generate trt engine
$ tlt-converter -k tlt_encode -t fp32 -e peoplesemsegnet.engine -p input_1,1x3x544x960,1x3x544x960,10x3x544x960 peoplesemsegnet.etlt
hflip_probability : 0.5
vflip_probability : 0.5
crop_and_resize_prob : 0.5
- Run inference with tlt model.
tlt unet inference -e spec.txt
- Run inference with trt engine.
tlt unet inference -e spec.txt
I ran the test you suggested and I also get absolutely identical results for tlt and trt inferences.
Then I retried training the model to be 100% sure I did not miss smth in configs. I stopped the training midway, ran inference and got this
Then exported to etlt, then converted to trt, using the same commands as mentioned above. Then ran inference on trt engine. got this
As you see, the trt result is much closer to tlt, but still noticebly off. Seems like the objects’ masks are somehow skewed to the left, especially on the left lane marking. Probably with more training the difference grows larger, because on my first examples you can see the same effect but to a greater extent.
Since your trial was using vanilla-unet-dynamic and mine uses resnet18, could you please make a trial with a resnet based model to see if that is where the issue lies?
Here I attach
tlt model model.step-9000.tlt — Яндекс.Диск
etlt model model_1.etlt — Яндекс.Диск
training spec experiment_config_18.cfg (19.5 KB)
Thanks, I can reproduce your result with your models. Still checking.
BTW, is your training dataset a public one? If yes, could you share the link? Thanks.