Extract Info(ClassID, Etc) from CityScape Segmentation

Hi,
I just want to know how to extract which class is being detected from CityScape Segmentation (segnet-camera.py)?
I see that it is mapping the color to the model detected but I don’t know how to get that model(classID) and other info from that in python?
There is this code is being called:

generate the overlay and mask

net.Overlay(img_overlay, width, height, opt.filter_mode)
net.Mask(img_mask, half_width, half_height, opt.filter_mode)

I know The mask image has each pixel as the classID, but I don’t know how to get the model(classID) in python. Would you please Help?

Hi,

Segmentation output is the mask image.
So the pixel value in each position is the class information.

You can just read the img_mask variable to get the class ID:

Thanks.

@AastaLLL
WOULD YOU PLEASE ASSIST HOW I CAN READ THE IMAGE_MASK TO GET THE CALSSID?
I can use OpenCV and read each image’s pixel array and try to get color, but this is what I could Guess, Would you please elaborate more how I can read the Image_mask to get classID?

Something like this:
jetson.utils.cudaDeviceSynchronize()
image = jetson.utils.cudaToNumpy(img_overlay, width, height, 4)
(r, g, b , m) = image[half_width, half_height]
print(“Pixel at w, h - Red {} Green {} blue {} max {}”.format(r, g, b, m))

This could be totaly wrong. It is just as far as good Guess, so would you please Assist how I can read the image_mask to get the classID?

Hi @kvgh82,

I believe something like this should work:

net.Mask(img_mask, half_width, half_height, "point")
jetson.utils.cudaDeviceSynchronize()
image = jetson.utils.cudaToNumpy(img_mask, half_width, half_height, 4)

y = 0  # y-coordinate of pixel you are reading
x = 0  # x-coordinate of pixel you are reading
pixel = image[y,x]

Note that the C++ version can output binary mask, where each pixel corresponds to class ID, so you needn’t decode the color: https://github.com/dusty-nv/jetson-inference/blob/b734d9f5d091e81685877b19a9d1b4483c3b6321/c/segNet.h#L195

I will have to add this function for binary mask to the Python bindings in the next version.

@dusty_nv, Thanks alot. When are you planning to release the next version?

My hope is within the next month. Although once I add it to my development branch, I can let you know, so you can pick it up there.

@dusty_nv, yes please let me know. One last thing. In order to get the pixel
pixel = image[y,x]
why you are putting y first, then x? isn’t it pixel = image[x,y] where x represent the width and y represent the height of image?

numpy arrays are ordered (height, width, channels), which follows the logical ordering of the data. If you print(image.shape) you can see that the height dimension is printed out first in the shape tuple.

OK, if you clone/build/install the dev branch (https://github.com/dusty-nv/jetson-inference/tree/dev), I have added Python support for getting the class mask. This gives you the class ID for each cell in the segmentation grid.

I updated segnet-console.py with --stats option that computes the class occurrence histogram from the mask using numpy (although you can replace this with whatever metrics that you want):

# compute mask statistics
if opt.stats:
	import numpy as np
	print('computing class statistics...')

	# work with the raw classification grid dimensions
	grid_width, grid_height = net.GetGridSize()	
	num_classes = net.GetNumClasses()

	# allocate a single-channel uint8 image for the class mask
	class_mask = jetson.utils.cudaAllocMapped(width=grid_width, height=grid_height, format="gray8")

	# get the class mask (each pixel contains the classID for that grid cell)
	net.Mask(class_mask, grid_width, grid_height)

	# view as numpy array (doesn't copy data)
	mask_array = jetson.utils.cudaToNumpy(class_mask)	

	# compute the number of times each class occurs in the mask
	class_histogram, _ = np.histogram(mask_array, num_classes)

	print('grid size:   {:d}x{:d}'.format(grid_width, grid_height))
	print('num classes: {:d}'.format(num_classes))

	print('-----------------------------------------')
	print(' ID  class name        count     %')
	print('-----------------------------------------')

	for n in range(num_classes):
		percentage = float(class_histogram[n]) / float(grid_width * grid_height)
		print(' {:>2d}  {:<18s} {:>3d}   {:f}'.format(n, net.GetClassDesc(n), class_histogram[n], percentage)) 

Sample output:

$ ./segnet-console.py --stats --network=fcn-resnet18-cityscapes-1024x512 images/city_0.jpg test_city_0.jpg
...
computing class statistics...
grid size:   32x16
num classes: 21
-----------------------------------------
 ID  class name        count     %
-----------------------------------------
  0  void                26   0.050781
  1  ego_vehicle         11   0.021484
  2  ground             195   0.380859
  3  road                18   0.035156
  4  sidewalk            14   0.027344
  5  building             0   0.000000
  6  wall                 1   0.001953
  7  fence                3   0.005859
  8  pole                 2   0.003906
  9  traffic_light        6   0.011719
 10  traffic_sign         0   0.000000
 11  vegetation         166   0.324219
 12  terrain              4   0.007812
 13  sky                 22   0.042969
 14  person               2   0.003906
 15  car                 29   0.056641
 16  truck                0   0.000000
 17  bus                 10   0.019531
 18  train                0   0.000000
 19  motorcycle           0   0.000000
 20  bicycle              3   0.005859

The segmenation grid used to generate the class mask is raw output of the model, so it will be lower in resolution than the original image. If desired segNet.Mask() can re-scale to whatever image size you pass in, using simple nearest-neighbor. So to not create extra computation, when calculating statistics with the mask it’s better to just operate on the size of the segmentation grid as opposed to the original image size. You can map any pixel from the mask to the original image using x/y scaling factors.

@dusty_nv, I Can’t Thank You Enough! I’ll try it tonight.

1 Like

@dusty_nv, if I want to use it with live camera(segment-camera.py), would it work in terms of performance? or should I compute mask statistics in a separate thread?

I haven’t profiled the performance, but since the segmentation grid is relatively small (compared to the input image), I think it should be fine. CPython doesn’t really do concurrent multithreading anyhow, and the cudaImage these are stored in probably would have issues across a python process pool.

@dusty_nv, after I cloned/build/installed dev repo, everything woks except all the live camera examples, I get this error:
File “segnet-camera.py”, line 73, in
img, width, height = camera.CaptureRGBA()
TypeError: ‘jetson.utils.cudaImage’ object is not iterable
[TRT] Could not register plugin creator: ::FlattenConcat_TRT

Do you think I should do clean install? cause I’ve installed Master branch before on a separate folder

OK, thanks for letting me know - if you pull/update your dev branch, I just checked in the fixes. The Python samples haven’t changed, it was the internal Python bindings.

To update, you can run something like this:

cd jetson-inference-dev    # location of your dev branch
git pull origin dev
cd build
cmake ../
make
sudo make install

The sudo make install step will overwrite the previous installation of jetson-inference under /usr, so you needn’t worry about having the master branch in another folder. You can switch back to master by running sudo make install on your master branch folder.

Thanks a lot @dusty_nv.

Hi @dusty_nv, I used that code on Live camera(segnet-camera.py), and looks like I am getting some off numbers comparing to the class color code. Would you please look at this code to see which part I am doing wrong? I am commenting out jetson.utils.cudaDeviceSynchronize(), but I don’t think this is an issue cause it works, but I get off numbers.

import numpy as np
import jetson.inference
import jetson.utils

import argparse
import ctypes
import sys

# parse the command line
parser = argparse.ArgumentParser(description="Segment a live camera stream using an semantic segmentation DNN.", 
						   formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.segNet.Usage())

parser.add_argument("--network", type=str, default="fcn-resnet18-voc", help="pre-trained model to load, see below for options")
parser.add_argument("--filter-mode", type=str, default="point", choices=["point", "linear"], help="filtering mode used during visualization, options are:\n  'point' or 'linear' (default: 'linear')")
parser.add_argument("--ignore-class", type=str, default="void", help="optional name of class to ignore in the visualization results (default: 'void')")
parser.add_argument("--alpha", type=float, default=175.0, help="alpha blending value to use during overlay, between 0.0 and 255.0 (default: 175.0)")
parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (e.g. CSI camera 0)\nor for VL42 cameras, the /dev/video device to use.\nby default, MIPI CSI camera 0 will be used.")
parser.add_argument("--width", type=int, default=1280, help="desired width of camera stream (default is 1280 pixels)")
parser.add_argument("--height", type=int, default=720, help="desired height of camera stream (default is 720 pixels)")

try:
	opt = parser.parse_known_args()[0]
except:
	print("")
	parser.print_help()
	sys.exit(0)

# load the segmentation network
net = jetson.inference.segNet(opt.network, sys.argv)

# set the alpha blending value
net.SetOverlayAlpha(opt.alpha)

# the mask image is half the size
half_width = int(opt.width/2)
half_height = int(opt.height/2)

# allocate the output images for the overlay & mask
img_overlay = jetson.utils.cudaAllocMapped(opt.width * opt.height * 4 * ctypes.sizeof(ctypes.c_float))
img_mask = jetson.utils.cudaAllocMapped(half_width * half_height * 4 * ctypes.sizeof(ctypes.c_float))

# create the camera and display
camera = jetson.utils.gstCamera(opt.width, opt.height, opt.camera)
display = jetson.utils.glDisplay()

# process frames until user exits
while display.IsOpen():
	# capture the image
	img, width, height = camera.CaptureRGBA()

	# process the segmentation network
	net.Process(img, width, height, opt.ignore_class)

	# generate the overlay and mask
	net.Overlay(img_overlay, width, height, opt.filter_mode)
	net.Mask(img_mask, half_width, half_height, opt.filter_mode)
    
    #jetson.utils.cudaDeviceSynchronize()
    #jetson.utils.saveImageRGBA(opt.file_out, img_output, width, height)

    # compute mask statistics
	print('computing class statistics...')

	# work with the raw classification grid dimensions
	grid_width, grid_height = net.GetGridSize()	
	num_classes = net.GetNumClasses()

	# allocate a single-channel uint8 image for the class mask
	class_mask = jetson.utils.cudaAllocMapped(width=grid_width, height=grid_height, format="gray8")

	# get the class mask (each pixel contains the classID for that grid cell)
	net.Mask(class_mask, grid_width, grid_height)

	# view as numpy array (doesn't copy data)
	mask_array = jetson.utils.cudaToNumpy(class_mask)	

	# compute the number of times each class occurs in the mask
	class_histogram, _ = np.histogram(mask_array, num_classes)

	print('grid size:   {:d}x{:d}'.format(grid_width, grid_height))
	print('num classes: {:d}'.format(num_classes))

	print('-----------------------------------------')
	print(' ID  class name        count     %')
	print('-----------------------------------------')

	for n in range(num_classes):
		percentage = float(class_histogram[n]) / float(grid_width * grid_height)
		print(' {:>2d}  {:<18s} {:>3d}   {:f}'.format(n, net.GetClassDesc(n), class_histogram[n], percentage)) 

	# render the images
	display.BeginRender()
	display.Render(img_overlay, width, height)
	display.Render(img_mask, half_width, half_height, width)
	display.EndRender()

	# update the title bar
	display.SetTitle("{:s} | Network {:.0f} FPS".format(opt.network, net.GetNetworkFPS()))

It looks ok to me, although I would move this code to above the camera loop so that you aren’t allocating new buffers each frame (you can re-use them):

    # work with the raw classification grid dimensions
	grid_width, grid_height = net.GetGridSize()	
	num_classes = net.GetNumClasses()

	# allocate a single-channel uint8 image for the class mask
	class_mask = jetson.utils.cudaAllocMapped(width=grid_width, height=grid_height, format="gray8")

When you say the results are “off”, what do you mean?

The total number of pixels output from the histogram will be smaller than the number of pixels in the camera image, because the segmentation classification grid is smaller. However I believe the overall percentages should be similar.

@dusty_nv, by off numbers I mean when the video shows red pixels which represent Person, or blue pixels which represent Sky, I get 0 number as count.

You may want to try printing out or saving the contents of mask_array to tell if the data is valid. Then you can tell if the histogram calculation is getting valid data, or if the histogram is off.

Each number from mask_array should be a classID.