That’s correct. It’s the results from Jetson-Inference (the latest version for Jetpack 3.1) with the fcn-alexnet-pascal-voc model.
Those don’t match up to the results when doing inference on that same image within Digits using the fcn-alexnet-pascal-voc model. On digits inference it’s able to detect the car, and highlight it. It’s far from perfect, but it works.
I haven’t had any issues with the FCN-Alexnet-Aerial-FPV-720p working on images it’s designed to work with.
The problem I’m having is with the fcn-alexnet-pascal-voc model example within Jetson-Inference. I did submit an issue on the github repository, but I haven’t gotten any response.
So I figured I’d check here to see if anyone has played around with segmentation on the Jetson TX1/TX2.
Hi S4WRXTTCS, those smaller vehicle_*.jpg images I added to the repo for testing the googlenet-12 model (imagenet). They seem to be the wrong resolution for using with segmentation. They were pulled from imagenet dataset for use with this step from the tutorial where 14 classes of vehicles are combined down to 1 class (and others).
I will look into Pascal-VOC model with the latest TensorRT/JetPack, but in general that model isn’t super quality, because the dataset is on the smaller end for deep learning segmentation. That model is provided by the repo mainly for compatibility with the DIGITS semantic-segmentation tutorial which references Pascal-VOC.
If you wish to detect vehicles with segmentation, I recommend downloading the Cityscapes test videos (https://www.cityscapes-dataset.com/) and using the fcn-alexnet-cityscapes-hd model (2048x2048) or fcn-alexnet-cityscapes-sd (1024x1024). The output looks similar to:
The segmentation with TensorRT does not exactly replicate the DIGITS model, which uses a deconvolution layer that due to it’s large stride of >100px, is removed from the deployed models because it won’t run with that stride in TensorRT and for runtime performance reasons. Hence my code in segNet.cpp manually computes the overlay, it may not always match but runs. In particular, I perform bilinear interpolation post-argmax of the class scores, instead of interpolating the 21-dimensional class scores like deconv and then doing argmax (again, for performance reasons). Both DIGITS and TensorRT have the same results before the upscaling (i.e. they are inferring the same low-resolution score grid). As mentioned the code for that is provided in segNet.cpp so it can be tweaked.
@dusty_nv: Thanks for the update. I used that image as an example because it was included in the repository, and it was easy for people to test. I came across the problem when testing images from the VOC-Pascal datasets used to train the model.
I’ve been testing the Pascal-VOC model because I based my own model on it. So I was trying to get it to work before using my own model.
Right now I have two models within Digits that work reasonably well within Digits. One is FCN-AlexNet with rather coarse outlines, and the other is an FCN-8 model that gives much better results.
My eventual goal is to get inference working on the Jetson-TX2 with the FCN-8 model, but I don’t know if that’s possible.
Have you tried running FCN-8 through TensorRT yet?- TensorRT2 maybe? I had heard similar things, that FCN-8 gives better results but is significantly slower. Initially I was sticking with FCN-Alexnet to get the best performance, but now I am thinking of other fully-convolutional networks to support in the future.
An idea would be to add support for a bilinear upscaling filter, with arbitrary stride. It would run quite fast on the GPU, and that’s what the deconvolution layer is used for in most of these models anyway.
And then you could support some codebase that makes it easier to slot in “plug-in” operator layers, like say caffe2.
And then you could put pressure on the DIGITS folks to provide their nice GUI for that codebase …
I also have the same issue running tensorRT 2.1 on a x86 computer with a GTX1070. I don’t have tested it with a jetson TX2 yet.
I used FCN-Alexnet with a custom dataset with high resolution pictures 2000x2000. I have followed the tutorial on the jetson-inference repo. On Digits the inference works properly and the segmentation is almost perfect. But with tensorRT i get the same kind of results than S4WRXTTCS. The label is foggy and completely wrong.
Hi Austriker, are you able to run the pretrained Aerial model (fcn-alexnet-aerial-fpv-720p) on the drone_*.png images that come with the repo?
Also there is pretrained Cityscapes model you can test on this image:
If you are unable to get a sensible output from the pretrained models, that may provide a hint as to the source of the issue.
It may also be valuable to visualize the raw output of the network instead of bilinear interpolation of the overlay. See here for an older version of segNet.cpp which contained more basic overlay routine to cut/paste. This may help determine if there is network issue or post-processing issue.
I have tested both examples on x86 and jetson TX2 and they both work.
The only difference is that my custom network has only 2 output. The background and the object I am searching for. Now the result is that the image is completely red which is the color of my label.
I am training my network with 3 classes (to avoid using the first class) to see if it solves the issue.
I have read your answer and known that you had tried running FCN-8 through TensorRT. I want to know how to run FCN-8 through TensorRT . Can you write some detailes for me? I come form China , and my English is not good , it’s so hard for me to study the document with the NIVIDA , and i hardly can not find the use guide about the TensorRT . I really need help to use the FCN-8 through TensorRT ,is anyone can help me?? Thanks!!!
I did use TensorRT 3, but it hasn’t example about fcn-8s, it gives three examples about fcn-alexnet. Another question is :
In one of the example foll\der named FCN-Alexnet-Aerial-FPV-720p , why it’s fcn_alexnet.deploy.prototxt code use:
the fpv-labels.txt have 3 labels not 21 labels . Why the num_output is 21, but not 3 ?? And the original.prototxt num_output is 21 toooo. Are you train it by DIGITS use num_output: 21 ? Why not is num_output: 3 ??