Free space detection using jetson inference segmentation

Hello All,

I am new to jetson inference and i am doing some research on whether it is possible to use jetson inference segmentation for free space detection or not? It is for robotic application. I am not finding any proper help. Could any one here just provide some information or urls to study?

Please refer to NVIDIA Isaac SDK | NVIDIA Developer

Thank you so much for your response. Our project has a restriction of not using Isaac. They are already using Jetson inference object detection and they would like to continue with the same. Hence i am looking for some thing similar. I mean jetson inference segmentation + free space/obstacle detection.

May I know what’s the Jetson platform you’re using?

Hello, It is Jetson AGX Xavior.


It is possible.
Here is semantic segmentation model for your reference:

This model can be retrained with TLT tool directly.


Thank you so much. I will go through it and get back.


To do freespace detection, you would focus on the segmentation class like ‘road’, ‘floor’, or ‘trail’. Some freespace models are 2-class (free or occluded), but you can also try using existing models like Cityscapes which have more classes (and just focus on the road).

For example, with Cityscapes model you would follow the purple road/ground. With DeepScene model you would follow the brown trail. With SUN-RGBD indoor model you could follow the green floor area. You could also train your own model particular to your scenario to improve accuracy.

Thank you so much. Let me have a try.

Hello Dusty Thank you so much for your support. I was just wondering can pass multiple classes inside ignore class to test or modify code?
Ex:I tried to run command to ignore wall and floor, as below but it did not work. It assumed default case and did not ignore anything.
./ --network=fcn-resnet18-sun --ignore-class=‘wall floor’ images/.mp4 images/test/output_.mp4

As you mentioned above if i want to focus on ‘floor’ only , then i would have to pass rest all classes in the ignore-class argument. Is my understanding correct.

If you could help me to edit the code (python) so that i can change the code to focus on floor only then it would be great. I know it is very basic question but i think i am missing some small trick. It would help me to solve on actual problem .


I think you would actually not want to use the ignore-class option for this, because it would re-classify those pixels as the next-most-likely class. You still want them to be classified correctly.

When you go to process the output, you would likely want to use the class ID mask from segNet.Mask(). This gives you a single-channel uint8 image back that has the class ID’s for each pixel (instead of a color). You would then just look for all the pixels with the same class ID as ‘floor’. Here is an example of that being used:

For visualization and the colorized image, if you only wanted to see two classes (free and occluded), you could just set the colors of all the other classes to the same color in your model’s color file.

Hello Dusty,

Thank you so much. I understand, I will try to modify code as per your suggestion.


Hello Dusty,
Thank you for your inputs, i was able to get pixels for floor.
But while testing randomly for the images provided in github, i found there is a mismatch in the histogram data while using --stats option.
Ex: room_1.jpg
‘floor’ -count and percentage is 0 where where as cabinet/shelves is 51 and 0.24% respectively.
command used as below:
python ./ --network=fcn-resnet18-sun ./room_1.jpg ./output_room_1.jpg --stats


‘floor’ -count and percentage is 0 where where as cabinet/shelves is 58 and 0.27% respectively. Its also detecting other objects.


In both cases color classification is same but its the problem with histogram data.

Am i doing some mistakes.

Could you please give some input.

Udaykiran Patnaik.

Thanks @uday.patnaik, I will have to look into it. My bet is something is off with the histogram (or how I was using the histogram function). The histogram was just meant as an example of using the class mask. The color mask/overlay is generated from the class mask, so I think the class mask is ok to keep using. You would just look for the class ID’s that you want (floor) in the class mask.

1 Like

Hello Dusty,
Thank you so much for your support till now.
I have few more questions.
I am using logicool C270n HD 720p camera in my project.

  1. In the example code present in the github, it’s written as self.grid_width, self.grid_height = net.GetGridSize(). This gives the class id per grid level i guess. but i want class id per pixel level instead of per grid. In such case shall i use the buffer size as 1280, 720 in the line –
    self.class_mask = jetson.utils.cudaAllocMapped(width=self.grid_width, height=self.grid_height, format=“gray8”)
    Could you please correct me if my understanding is correct or not?

  2. Due to some project requirement i need to create an opencv image in my application. So i am using cv.imshow().
    For an image it works fine. But when i pass a mp4 file or take live video feed using above camera then cv.imshow() shows blank(white screen). What is the best method to read and image and convert it into opencv? I am using a test code some thing like below to just read the video feed and convert it into opencv and then show it:
    img = jetson.utils.cudaToNumpy(input.Capture(), 1280, 720, 4)
    img = cv.cvtColor(img, cv.COLOR_RGBA2RGB).astype(np.uint8)
    img = cv.cvtColor(img, cv.COLOR_RGB2BGR)
    img_h,img_w,img_c = img.shape
    cv.imshow(“original”, img)

I get output image as below:

So, am i doing something wrong in conversion? Strange thing is that it works absolutely fine with an image ex: room_1.jpeg. Is it something to do with frame rate? I have no idea. Could you please help me.

Udaykiran Patnaik.

In theory that is correct. That class ID mask will be resized (using nearest-neighbor sampling) to whatever the size of the buffer is. However this also means that you aren’t really gaining any information by upsampling to your full camera resolution, while only increasing your processing load. It may be advisable to change the aspect ratio to match your camera, but to keep the resolution small - like 160x90.

Can you try changing the code to something like this?

img = input.Capture()
img = jetson.utils.cudaToNumpy(img, 1280, 720, 4)

Hello Dusty,
Thank you so much. I understood.

Udaykiran Patnaik.

Hello Dusty,

Thank you for your support till now.
I need some more support.
My project requirement is changed. They have asked me to take image or video from opencv , apply jetson utils to segment video, convert it back to opencv and display or save.

I was able to do it successfully, but what we have observed is that the segmented output is not as fine as it is without opencv. As a result there many pockets inside individual frames in the video where it is unable to detect the floor correctly. So robot may stop thinking that there is an obstacle, where as it should have moved ahead.
I would really appreciate your help. What we have observed is that final output opencv image contains grids of larger size due to which there are many open areas which is not captured as floor. You can think of as small sized stones fill up the glass properly or more dense than large size stones in a glass.

Here is my code below:

  cap = cv2.VideoCapture(0)
  ret, frame =
  frame_rgba = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
  cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)

  # process the segmentation network
  num_classes = net.GetNumClasses()
  img = jetson.utils.cudaToNumpy(cuda_frame, img_width, img_height, 4)
  img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB).astype(np.uint8)
  img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) 

  # Allocate buffer for mask
  class_mask = jetson.utils.cudaAllocMa`Preformatted text`pped(width=img_width, height=img_height, format="gray8")
  class_mask_np = jetson.utils.cudaToNumpy(class_mask)

  # get the class mask (each pixel contains the classID for itself)
  net.Mask(class_mask, img_width, img_height)
  class_mask_np = jetson.utils.cudaToNumpy(class_mask)

  # compute the number of times each class occurs in the mask
  arr = np.array(class_mask_np)            	
  img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_LINEAR) 
  output = img.copy()

  # Color the pixel with green for those representing a class_id 
  if args.classid == 99:
    for n in range(num_classes):
      valid = np.all(arr == n, axis = -1)
      rs, cs = valid.nonzero()
      colorCode = net.GetClassColor(n)
      output[rs, cs, :] = [colorCode[0],colorCode[1],colorCode[2]]
    valid = np.all(arr == args.classid, axis = -1)
    rs, cs = valid.nonzero()
    colorCode = net.GetClassColor(args.classid)
    output[rs, cs, :] = [colorCode[0],colorCode[1],colorCode[2]]
  overlayed_image = cv2.addWeighted(img,0.5,output,0.5,0)
  cv2.imshow("overlayed_image", overlayed_image)

Are you trying to get the colorized mask? If so, just pass in an rgb8 image to segNet.Mask() instead of the gray8 image. Then it will give you color mask back.

Hello Dusty,

Perfect!. It’s working as expected now.

Thanks and Regards,
Udaykiran Patnaik.