Semantic Segmentation_Outdoor Navigation with Segnet and Jetson Nano

Hello Everybody,

this topic is in continuation with this link.

I’m using a modified Segnet script from here and here:

I’m attaching the modified code ( and other supporting files as well: (194.3 KB)

I followed the guidance here and provided for input image with 576x320 pixels and used this command:
$ ./python3 --network=fcn-resnet18-deepscene-576x320 input.jpg output.jpg
(The input & output pictures are attached in the zip file.)

I was interested for the output matrix/array which containing just the annotated pixel labels (0-4 ; especially the 0 -> trail) for navigation purpose).

Unfortunately instead of 576x320 label matrix I got only 18x10 label matrix.
Like this:

I attached the full logfile.txt in the zip file as well.

My questions are:
How can I get higher resolution of the label matrix ? Is 576x320 possible ?
May be I made something wrongly ?

Thank You very much for Your help in advance.

With Regards,


It seems that mask size is set to the grid size rather than image size.
Could you try to set it into the image size to check if meet your requirement?


Aasta is correct, if you pass in a larger image to segNet.Mask() it will upscale the results from the raw grid size (18x10) to your image (576x320). However, with the class ID’s this interpolation is done with nearest-neighbor point filtering, so you are only really replicating extra data without gaining additional information.

I modified the like this (no other changes in compare to the original verson):

grid_width = 576
grid_height = 320
class_mask = jetson.utils.cudaAllocMapped(width=grid_width, height=grid_height, format=“gray8”)
class_mask_np = jetson.utils.cudaToNumpy(class_mask)
net.Mask(class_mask, grid_width, grid_height)

As the result, I’m getting a big array in the class_mask_np, which is good sign :-)
The class_mask_np.shape is coming: (320,576,1) Last dimension having the labels (0-4) :-)

However the resolution of the overlay image and mask image not changed. Still rough. Why ? I would like to see the higher resolution segmented video like this:
Higer resolution
Because later I would like to detect a particular area, calculate the orientation, put some rectangular/line/ on the image etc. If I can’t see, then I don’t know is the program working fine or not.

Has the class_mask_np array proper resolution or just up-scaled from the rough/mask image (18x10 res.) ?

In the sample, the resolution of the mask image is half the resolution of the input. You can change that here so it stays the original size:

It is just up-scaled from the rough grid dimensions (18x10) using nearest-neighbor point filtering. The 18x10 grid is the output of the network. So it isn’t really producing more useful information by upscaling it, aside from visualization purposes.

Dear Dusty,

thank You very much for Your answer.
The size of the buffers.mask (288x160) seems to be okay, but I would like to have more detailed, segmented picture and more detailed label matrix (buffers.mask & class_mask_np) to have more accurate navigation.

I assume for this the 18x10 grid shall be for example four times bigger in x & y (totally x16): 72x40, but detailed and not ‘just’ up-scaled.
Is it possible ?
Just for clarification:

For the class ID’s (class_mask_np), it is just scaled up. For the buffers.mask (colorized mask), it is using bilinear interpolation to give it a better appearance, however you are blending classes together and not really gaining detail.

I believe the picture on the right from your post is the ground-truth from the dataset. It can be hard to achieve that level of segmentation detail without increasing the input resolution of the network (which will linearly increase the size of the output matrix). In this case, that would mean using a larger-resolution dataset since this model was already trained at the full size of the dataset (or increasing the resolution of this dataset, which is probably not ideal). The DeepScenes dataset was one of the lower-resolution datasets I tried for segmentation, which means that it has a smaller output mask (on the other hand, Cityscapes is 2048x1024 resolution).

I saw a new off-road segmentation dataset was recently announced, which appears to be higher resolution:

I haven’t tried training a DNN with it, but perhaps that could be another option.