Unable to get segmentation to work with Jetson TX2

Has anyone had much luck with Segmentation inference on the Jetson TX1/TX2 outside of the fcn-alexnet-aerial-fpv-720p model?

When I use the aerial 720p model along with the example image it works fine, and matches that of the jetson-inference tutorial.

But, if I use the fcn-alexnet-pascal-voc model I don’t get anything close to what I expect. So I have a feeling that I’m doing something terribly stupid.

Here is an example of the results I’m getting with the pascal-voc model.

./segnet-console vehicle_1.jpg test1.png fcn-alexnet-pascal-voc


Argument should like this:

./segnet-console vehicle_1.jpg output.jpg \
--prototxt=networks/FCN-Alexnet-Aerial-FPV-720p/fcn_alexnet.deploy.prototxt \
--model=networks/FCN-Alexnet-Aerial-FPV-720p/snapshot_iter_10280.caffemodel \
--labels=networks/FCN-Alexnet-Aerial-FPV-720p/fpv-labels.txt \
--colors=networks/FCN-Alexnet-Aerial-FPV-720p/fpv-deploy-colors.txt \
--input_blob=data \


Since it’s one of the built in models I get the same invalid results whether I use

./segnet-console vehicle_1.jpg test1.png fcn-alexnet-pascal-voc


./segnet-console vehicle_1.jpg output.jpg --prototxt=networks/FCN-Alexnet-Pascal-VOC/deploy.prototxt --model=networks/FCN-Alexnet-Pascal-VOC/snapshot_iter_146400.caffemodel --labels=networks/FCN-Alexnet-Pascal-VOC/pascal-voc-classes.txt --colors=networks/FCN-Alexnet-Pascal-VOC/pascal-voc-colors.txt --input_blob=data --output_blob=score_fr_21classes

Is the image posted in #1 your results?

For FCN-Alexnet-Aerial-FPV-720p, we can get the vehicle_1.jpg results like attachment.
It’s quite different from yours.


That’s correct. It’s the results from Jetson-Inference (the latest version for Jetpack 3.1) with the fcn-alexnet-pascal-voc model.

Those don’t match up to the results when doing inference on that same image within Digits using the fcn-alexnet-pascal-voc model. On digits inference it’s able to detect the car, and highlight it. It’s far from perfect, but it works.

I haven’t had any issues with the FCN-Alexnet-Aerial-FPV-720p working on images it’s designed to work with.

The problem I’m having is with the fcn-alexnet-pascal-voc model example within Jetson-Inference. I did submit an issue on the github repository, but I haven’t gotten any response.

So I figured I’d check here to see if anyone has played around with segmentation on the Jetson TX1/TX2.


Thanks for your feedback. Model fcn-alexnet-pascal-voc with Jetson-Inference is not working.
We are checking this issue now. Will update information to you later.

May I know the model you used on DIGITs is the same one downloaded from Jetson-Inference or trained by your self?

Hi S4WRXTTCS, those smaller vehicle_*.jpg images I added to the repo for testing the googlenet-12 model (imagenet). They seem to be the wrong resolution for using with segmentation. They were pulled from imagenet dataset for use with this step from the tutorial where 14 classes of vehicles are combined down to 1 class (and others).

I will look into Pascal-VOC model with the latest TensorRT/JetPack, but in general that model isn’t super quality, because the dataset is on the smaller end for deep learning segmentation. That model is provided by the repo mainly for compatibility with the DIGITS semantic-segmentation tutorial which references Pascal-VOC.

If you wish to detect vehicles with segmentation, I recommend downloading the Cityscapes test videos ([url]https://www.cityscapes-dataset.com/[/url]) and using the fcn-alexnet-cityscapes-hd model (2048x2048) or fcn-alexnet-cityscapes-sd (1024x1024). The output looks similar to:

External Media

The segmentation with TensorRT does not exactly replicate the DIGITS model, which uses a deconvolution layer that due to it’s large stride of >100px, is removed from the deployed models because it won’t run with that stride in TensorRT and for runtime performance reasons. Hence my code in segNet.cpp manually computes the overlay, it may not always match but runs. In particular, I perform bilinear interpolation post-argmax of the class scores, instead of interpolating the 21-dimensional class scores like deconv and then doing argmax (again, for performance reasons). Both DIGITS and TensorRT have the same results before the upscaling (i.e. they are inferring the same low-resolution score grid). As mentioned the code for that is provided in segNet.cpp so it can be tweaked.

@dusty_nv: Thanks for the update. I used that image as an example because it was included in the repository, and it was easy for people to test. I came across the problem when testing images from the VOC-Pascal datasets used to train the model.

I’ve been testing the Pascal-VOC model because I based my own model on it. So I was trying to get it to work before using my own model.

Right now I have two models within Digits that work reasonably well within Digits. One is FCN-AlexNet with rather coarse outlines, and the other is an FCN-8 model that gives much better results.

My eventual goal is to get inference working on the Jetson-TX2 with the FCN-8 model, but I don’t know if that’s possible.

The model I used on Digits is the one trained by myself following the Digits Semantic Segmentation tutorial.

So technically they are different models, but they were trained with the same data/setup.

Have you tried running FCN-8 through TensorRT yet?- TensorRT2 maybe? I had heard similar things, that FCN-8 gives better results but is significantly slower. Initially I was sticking with FCN-Alexnet to get the best performance, but now I am thinking of other fully-convolutional networks to support in the future.

An idea would be to add support for a bilinear upscaling filter, with arbitrary stride. It would run quite fast on the GPU, and that’s what the deconvolution layer is used for in most of these models anyway.

And then you could support some codebase that makes it easier to slot in “plug-in” operator layers, like say caffe2.

And then you could put pressure on the DIGITS folks to provide their nice GUI for that codebase …


I also have the same issue running tensorRT 2.1 on a x86 computer with a GTX1070. I don’t have tested it with a jetson TX2 yet.

I used FCN-Alexnet with a custom dataset with high resolution pictures 2000x2000. I have followed the tutorial on the jetson-inference repo. On Digits the inference works properly and the segmentation is almost perfect. But with tensorRT i get the same kind of results than S4WRXTTCS. The label is foggy and completely wrong.

Is there something wrong with tensorRT 2.1 ?

Best Regards.

Hi Austriker, are you able to run the pretrained Aerial model (fcn-alexnet-aerial-fpv-720p) on the drone_*.png images that come with the repo?
Also there is pretrained Cityscapes model you can test on this image:


If you are unable to get a sensible output from the pretrained models, that may provide a hint as to the source of the issue.

It may also be valuable to visualize the raw output of the network instead of bilinear interpolation of the overlay. See here for an older version of segNet.cpp which contained more basic overlay routine to cut/paste. This may help determine if there is network issue or post-processing issue.

Hi Dusty,

Thank you for your answer.

I have tested both examples on x86 and jetson TX2 and they both work.
The only difference is that my custom network has only 2 output. The background and the object I am searching for. Now the result is that the image is completely red which is the color of my label.

I am training my network with 3 classes (to avoid using the first class) to see if it solves the issue.

Hi Dusty,

I have tested segnet-console with a retrained model with 3 classes instead of only 2. And the segmentation works fine now.

Best regards

Hi Austriker,
I have read your answer and known that you had tried running FCN-8 through TensorRT. I want to know how to run FCN-8 through TensorRT . Can you write some detailes for me? I come form China , and my English is not good , it’s so hard for me to study the document with the NIVIDA , and i hardly can not find the use guide about the TensorRT . I really need help to use the FCN-8 through TensorRT ,is anyone can help me?? Thanks!!!

Best regards


You can run a custom model with this command:
[url]GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.


OK~I have seen the command ,and already solved the problem successfully with the fcn-Alexnet ! Thank you very much ! But in the fcn-8S ,it has three layers:
layer {
name: “upscore”
type: “Deconvolution”
bottom: “score_fr”
top: “upscore”
param {
lr_mult: 0.0
convolution_param {
num_output: 21
bias_term: false
kernel_size: 63
group: 21
stride: 32
weight_filler {
type: “bilinear”
layer {
name: “score”
type: “Crop”
bottom: “upscore”
bottom: “data”
top: “score”
crop_param {
axis: 2
offset: 18

Can you tell me how to run in TensorRT ? I have deleted the layers ,but it seem doesn’t work, and make some error.


Do you use TensorRT 3?
If not, could you upgrade TensorRT first to check if the issue remains?


Hi, AastaLLL
I did use TensorRT 3, but it hasn’t example about fcn-8s, it gives three examples about fcn-alexnet. Another question is :
In one of the example foll\der named FCN-Alexnet-Aerial-FPV-720p , why it’s fcn_alexnet.deploy.prototxt code use:
layer {
name: “score_fr_21classes.pruned”
type: “Convolution”
bottom: “fc7”
top: “score_fr_21classes”
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
convolution_param {
num_output: 21
pad: 0
kernel_size: 1

the fpv-labels.txt have 3 labels not 21 labels . Why the num_output is 21, but not 3 ?? And the original.prototxt num_output is 21 toooo. Are you train it by DIGITS use num_output: 21 ? Why not is num_output: 3 ??

Best regards!