Free space detection using jetson inference segmentation

Hello Dusty Thank you so much for your support. I was just wondering can pass multiple classes inside ignore class to test or modify code?
Ex:I tried to run command to ignore wall and floor, as below but it did not work. It assumed default case and did not ignore anything.
./segnet.py --network=fcn-resnet18-sun --ignore-class=‘wall floor’ images/.mp4 images/test/output_.mp4

As you mentioned above if i want to focus on ‘floor’ only , then i would have to pass rest all classes in the ignore-class argument. Is my understanding correct.

If you could help me to edit the code (python) so that i can change the code to focus on floor only then it would be great. I know it is very basic question but i think i am missing some small trick. It would help me to solve on actual problem .

Regards,
Udaykiran.

I think you would actually not want to use the ignore-class option for this, because it would re-classify those pixels as the next-most-likely class. You still want them to be classified correctly.

When you go to process the output, you would likely want to use the class ID mask from segNet.Mask(). This gives you a single-channel uint8 image back that has the class ID’s for each pixel (instead of a color). You would then just look for all the pixels with the same class ID as ‘floor’. Here is an example of that being used:

https://github.com/dusty-nv/jetson-inference/blob/6e078d21396298c3a9f1f1ea1f2c27bf80bbd4a6/python/examples/segnet_utils.py#L80

For visualization and the colorized image, if you only wanted to see two classes (free and occluded), you could just set the colors of all the other classes to the same color in your model’s color file.

Hello Dusty,

Thank you so much. I understand, I will try to modify code as per your suggestion.

Regards,
Udaykiran.

Hello Dusty,
Thank you for your inputs, i was able to get pixels for floor.
But while testing randomly for the images provided in github, i found there is a mismatch in the histogram data while using --stats option.
Ex: room_1.jpg
‘floor’ -count and percentage is 0 where where as cabinet/shelves is 51 and 0.24% respectively.
command used as below:
python ./segnet.py --network=fcn-resnet18-sun ./room_1.jpg ./output_room_1.jpg --stats

image

Ex:2room_4.jpg
‘floor’ -count and percentage is 0 where where as cabinet/shelves is 58 and 0.27% respectively. Its also detecting other objects.

image

In both cases color classification is same but its the problem with histogram data.

Am i doing some mistakes.

Could you please give some input.

Regards,
Udaykiran Patnaik.

Thanks @uday.patnaik, I will have to look into it. My bet is something is off with the histogram (or how I was using the histogram function). The histogram was just meant as an example of using the class mask. The color mask/overlay is generated from the class mask, so I think the class mask is ok to keep using. You would just look for the class ID’s that you want (floor) in the class mask.

1 Like

Hello Dusty,
Thank you so much for your support till now.
I have few more questions.
I am using logicool C270n HD 720p camera in my project.

  1. In the example code present in the github, it’s written as self.grid_width, self.grid_height = net.GetGridSize(). This gives the class id per grid level i guess. but i want class id per pixel level instead of per grid. In such case shall i use the buffer size as 1280, 720 in the line –
    self.class_mask = jetson.utils.cudaAllocMapped(width=self.grid_width, height=self.grid_height, format=“gray8”)
    Could you please correct me if my understanding is correct or not?

  2. Due to some project requirement i need to create an opencv image in my application. So i am using cv.imshow().
    For an image it works fine. But when i pass a mp4 file or take live video feed using above camera then cv.imshow() shows blank(white screen). What is the best method to read and image and convert it into opencv? I am using a test code some thing like below to just read the video feed and convert it into opencv and then show it:
    img = jetson.utils.cudaToNumpy(input.Capture(), 1280, 720, 4)
    img = cv.cvtColor(img, cv.COLOR_RGBA2RGB).astype(np.uint8)
    img = cv.cvtColor(img, cv.COLOR_RGB2BGR)
    img_h,img_w,img_c = img.shape
    cv.imshow(“original”, img)
    cv.waitKey(1)

I get output image as below:
image

So, am i doing something wrong in conversion? Strange thing is that it works absolutely fine with an image ex: room_1.jpeg. Is it something to do with frame rate? I have no idea. Could you please help me.

Regards,
Udaykiran Patnaik.

In theory that is correct. That class ID mask will be resized (using nearest-neighbor sampling) to whatever the size of the buffer is. However this also means that you aren’t really gaining any information by upsampling to your full camera resolution, while only increasing your processing load. It may be advisable to change the aspect ratio to match your camera, but to keep the resolution small - like 160x90.

Can you try changing the code to something like this?

img = input.Capture()
jetson.utils.cudaDeviceSynchronize()
img = jetson.utils.cudaToNumpy(img, 1280, 720, 4)
...

Hello Dusty,
Thank you so much. I understood.

Regards,
Udaykiran Patnaik.

Hello Dusty,

Thank you for your support till now.
I need some more support.
My project requirement is changed. They have asked me to take image or video from opencv , apply jetson utils to segment video, convert it back to opencv and display or save.

I was able to do it successfully, but what we have observed is that the segmented output is not as fine as it is without opencv. As a result there many pockets inside individual frames in the video where it is unable to detect the floor correctly. So robot may stop thinking that there is an obstacle, where as it should have moved ahead.
I would really appreciate your help. What we have observed is that final output opencv image contains grids of larger size due to which there are many open areas which is not captured as floor. You can think of as small sized stones fill up the glass properly or more dense than large size stones in a glass.

Here is my code below:

  cap = cv2.VideoCapture(0)
  ret, frame = cap.read()
  frame_rgba = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
  cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)

  # process the segmentation network
  net.Process(cuda_frame)
  num_classes = net.GetNumClasses()
  jetson.utils.cudaDeviceSynchronize()
  img = jetson.utils.cudaToNumpy(cuda_frame, img_width, img_height, 4)
  img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB).astype(np.uint8)
  img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) 

  # Allocate buffer for mask
  class_mask = jetson.utils.cudaAllocMa`Preformatted text`pped(width=img_width, height=img_height, format="gray8")
  class_mask_np = jetson.utils.cudaToNumpy(class_mask)

  # get the class mask (each pixel contains the classID for itself)
  net.Mask(class_mask, img_width, img_height)
  class_mask_np = jetson.utils.cudaToNumpy(class_mask)

  # compute the number of times each class occurs in the mask
  arr = np.array(class_mask_np)            	
  img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_LINEAR) 
  output = img.copy()

  # Color the pixel with green for those representing a class_id 
  if args.classid == 99:
    for n in range(num_classes):
      valid = np.all(arr == n, axis = -1)
      rs, cs = valid.nonzero()
      colorCode = net.GetClassColor(n)
      output[rs, cs, :] = [colorCode[0],colorCode[1],colorCode[2]]
  else:
    valid = np.all(arr == args.classid, axis = -1)
    rs, cs = valid.nonzero()
    colorCode = net.GetClassColor(args.classid)
    output[rs, cs, :] = [colorCode[0],colorCode[1],colorCode[2]]
  overlayed_image = cv2.addWeighted(img,0.5,output,0.5,0)
  cv2.imshow("overlayed_image", overlayed_image)

Are you trying to get the colorized mask? If so, just pass in an rgb8 image to segNet.Mask() instead of the gray8 image. Then it will give you color mask back.

Hello Dusty,

Perfect!. It’s working as expected now.

Thanks and Regards,
Udaykiran Patnaik.

OK great, glad you got it working how you wanted it to.

One item to point out, if you wish to run this on a realtime stream, it should help the performance to not allocate new CUDA buffers each frame. Instead you can do something like this:

class_mask = None
class_mask_np = None

while True:   # your camera loop
   ret, frame = cap.read()   
   # your other code here

  if class_mask is None:
     class_mask = jetson.utils.cudaAllocMapped(width=img_width, height=img_height, format="gray8")
     class_mask_np = jetson.utils.cudaToNumpy(class_mask)

This way, the CUDA memory is only allocated once. Also you only need to call cudaToNumpy() once per buffer - the mapping is persistent. Any changes you make in numpy will show up in CUDA memory, and vice versa (because it’s mapped to the same memory). Your other cudaToNumpy() calls you only need to do once also.

Hello Dusty,

Thank you so much for your suggestion.
I updated code as per your suggestion.
It is working charm for ‘class_mask’ but it is not working for ‘cuda_frame’.
At the end of this code img variable is looking like a masked image. Instead i want img to be original image (with cuda optimization) so that i can use this img and class_mask output to create an overlay image using opencv. Now both are masked image so the final overlay-ed image is also a masked image.

Can not i use cuda optimization for cuda_frame. As suggested i wanted to call cudaToNumpy only once here for cuda_frame. I was able to get frames but all masked.

Below is the snippet of the code i tried. It is extension of previous posted code.

  # Allocate buffer for cuda_frame
  if cuda_frame is None:
    cuda_frame = jetson.utils.cudaAllocMapped(width=img_width, height=img_height, format="rgba8")
    img = jetson.utils.cudaToNumpy(cuda_frame, img_width, img_height, 4)

  frame_rgba = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
  cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)

  # process the segmentation network
  net.Process(cuda_frame)
  num_classes = net.GetNumClasses()
  jetson.utils.cudaDeviceSynchronize()
  img = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB).astype(np.uint8)
  img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) 

  # Allocate buffer for mask
  if class_mask is None:
    class_mask = jetson.utils.cudaAllocMapped(width=img_width, height=img_height, format="rgb8")
    class_mask_np = jetson.utils.cudaToNumpy(class_mask)
 
  # get the class mask (each pixel contains the classID for itself)
  net.Mask(class_mask, img_width, img_height, format="rgb8")

Hello Dusty,

Apart from above last clarification, i would need another valuable input from your side.
If i can fix above optimisation issue then i am almost done with free space with segmentation task.

As part of my project now i have another task.

Now for my robotic application my task is to track a moving person on the free space detected above. I guess it is some thing to do with depth estimation but have no idea about it.

Since I have a limitation to use jetson inference only since it was used for object detection, free space segmentation, is there any way that tracking moving object or person is possible to achieve using jetson inference?
OR
Is there any way that i can collaborate with any other available tracking algorithm with jetson inference?

Any guidance or github link to explore would be really helpful to achieve my goal.

Thanks and Regards,
Udaykiran Patnaik.

I think you may not be able to use it for cuda_frame, because that comes from your cv2.VideoCapture(), which returns new frame each time. You also don’t need to allocate CUDA memory yourself for this. I think you may just go back to:

cap = cv2.VideoCapture(0)
ret, frame = cap.read()
frame_rgba = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
cuda_frame = jetson.utils.cudaFromNumpy(frame_rgba)

Or really, you could just use jetson.utils.videoSource() and it will already be in CUDA for you.

Hello Dusty,

Thank you so much.
I understood.

I think this solves my free space detection problem as of now.

Thanks and Regards,
Udaykiran Patnaik.

I haven’t done tracking before with jetson-inference, but VPI (Vision Programming Interface) has a tracking algorithm: VPI - Vision Programming Interface: KLT Bounding Box Tracker

VPI doesn’t have a Python interface yet, that will be coming in a future version. DeepStream has tracking too.

Hello Dusty,
Thank you so much for your quick reply.
I understood.

Thanks and Regards,
Udaykiran Patnaik.

No problem - by the way, if you don’t need temporal tracking, you could simply do it as you outlined in the other thread:

The VPI or DeepStream-based tracking would be for if you wanted certainty that the bounding box was the same person frame-to-frame (for example, if multiple people were in the camera frame).

“Tracking by detection” is essentially what was being referred to in the other thread - since these object detection DNNs are fairly accurate, they produce detection bounding-boxes a good amount of the time (although there is noise). However the object detection DNN doesn’t know if that is the same person or not in the bounding box - just that it is a person (any person).

Hello Dusty,

Thank you so much for your reply.
I understood. I will try what i have outlined in other thread.

Thanks and Regards,
Udaykiran Patnaik.