Hi, I am starting to play with segnet SUN RGB-D as a method of navigating my toy robot across the house.
The demos gave me great hope that ‘floor’ could be recognised and that is a start to navigating around obstacles.
My test images, all taken by the robot so are close to the ground, have all failed with the network=fcn-resnet18-sun
From previous trials - non Jetson and no AI fancy stuff - this was because of the reflected light off the tiles and, to a lesser extent, the grouting between the tiles (which was solved by bluring the image a bit).
From the output of the example program segnet.py it also shows an issue where the ‘floor’ is not detected right up to the front of the device
Does anybody have any suggestions about how the imput image could be ‘doctored’ so that it is more compatible with fcn-resnet18-sun.
It would be a lot of work to create my own model - apart from the photographic effort every image would have to be edited to remove the rest of the room.
When it boils down to what I have envisaged the only relevent classes are
floor - where it can go
person - for other processing to see who
cat - to avoid or maybe to chase
anything else is ‘obstacle’ to be avoided
Hi @jc5p, unfortunately I don’t have a great idea of how to transform your imagery such that it works better with the pre-trained model. However if you wanted to explore training your own segmentation model, here is a tutorial about that:
@AastaLLL Yes that is where the info for the test came from. @dusty_nv Thanks for the pointer to how to how to train a seqmentation model. As I suspected it is a daunting and very time consuming task. Which is why I was thinking of cheating!
Do you know if you can get access to the images from the pre-trained models and then reassemble them? For example all the ‘cat’ and ‘dog’ images from the VOC set plus all the ‘floor’ and ‘person’ images from the SUN set which could then be augmented by, in my case, extra ‘floor’ images ?
I don’t think it would be as simple as only choosing the “floor” images for example, because an image typically has several segmentation classes within it. What you may want/need to do is pre-process these datasets and only select the classes that you want to use for the mask images. And discard images that have none of the classes you want.