FCN Segmentation: Imported model with Tensorrt gives significantly worse results than pycaffe

Linux distro and version: Ubuntu 18.04
GPU type: Titan X
nvidia driver version: 410.79
CUDA version: V10.0.130
CUDNN version: 7.4.2.24
Python version: 3.6.7
Caffe version: Anaconda python 3.6 installation with “conda install -c anaconda caffe-gpu”
Tensorflow version -
TensorRT version: GA_5.0.2.6
If Jetson, OS, hw versions -

I have been running inference on a custom trained skin detector in caffe for a while. I attempted to port the implementation to tensorrt 5 to get a speed increase in inference. However my predictions are considerably worse (not totally of that it doesnt work, just more error prone). I’m having trouble understanding what is causing this problem. I’m wondering whether any unsupported parameters or layer fusions might be the issue. Any help would be appriciated. I have tried inference at different image sizes to no avail.

In addition one major difference between the two inference methods is that, while the caffe one can handle inputs of any size perfectly well, in the tensorrt one, unless the image dimensions fit 10+(16*x), the mask appears a few pixels shifted from its original location.

I have tried both the cpp and the python interfaces. Since the python one is briefer, i’m adding the tensorrt and caffe scripts i run to obtain inference from both images.

Thanks,

Alp


Files.zip (280 KB)

hello,

to help us debug, can you share the model_file=’/Models/Skincolor_V_25/snapshot_iter_449800.caffemodel’ ?

Hello,

using your repro, I’m seeing

Caffe (nvcr.io/nvidia/caffe:19.01-py2): function took 338.091135025ms
TensorRT (nvcr.io/nvidia/tensorrt:19.01-py3): function took 15.459299087524414ms

Thanks for your answer.

Yes there is definetly a speed increase both on c++ and on the python interfaces. However there is a segmentation quality decrease.

I updated my code to display jaccard scores and i’m uploading 4 images with their ground truth masks. The tensorrt interface achieves a mean jaccard similarity score of 0.640 compared to the caffe score of 0.737. The difference can also be seen from the images overlaid with resulting masks.
Files_with_jaccard_score.zip (1.98 MB)

I tried the same code with TensorRT version: 5.1.2 RC and still no change.

I have tried the code with the latest 5.1.5 version of Tensorrt. It seems that the bug was caused by the crop layer of Caffe. The bug has been fixed. Thanks a lot.