What is PyCapsule objects in Jetson-Inference Python scripts ?

marconi.k · July 26, 2019, 9:30am

Hi everyone,

I’ve started to work with the repo of Dusty-nv (GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.) to try to use the python scripts with SSD-mobilenet-V2. I’m using the “detectnet-camera.py” script and I would like to work with the images from my CSI camera (cropping them for example) but their type is “PyCapsule object” and I can’t find anything relevant on it on google. I’m used to work with OpenCV frames (UMat frame) which can easily be cropped but I can’t find any solution to use these PyCapsule things. Does anyone know this type of object ?

Regards,

Kévin Marconi

dusty_nv · July 26, 2019, 2:18pm

Hi Kevin, PyCapsule objects are used to pass pointers to memory without incurring memory copies in Python or extra overhead (in this case, CUDA memory).

See the cudaToNumpy and cudaFromNumpy scripts to convert the PyCapsule to an ndarray.

marconi.k · July 26, 2019, 2:22pm

Thanks for the reply, will take a look at it :)

marconi.k · July 29, 2019, 11:31am

Hi

I’ve tried to use the cudaToNumpy script in the detectnet-camera one. So first, I’m creating the CudaAllocation memory with the width,height and depth of my image, then I create an array with it. It is giving me an array full of 0 so I’m sure I’m doing something wrong ^^ How do I fill this array with the pixels from my pycapsule image ?

Regards

Kévin Marconi

dusty_nv · July 29, 2019, 6:24pm

If you just allocated the CUDA memory, it would still be blank and filled with zeros.

It sounds like what you want to do is convert your CvMat to numpy ndarray, and then call cudaFromNumpy()

marconi.k · July 29, 2019, 6:36pm

What I want to do is to crop the img from the captureRGBA method but this in an Pycapsule object and I don’t know how to use it. The only way I know to crop img is when they are numpy.ndarray type so i’m looking to convert this pycapsule img into a numpy array :) Hope it is more clear for you.

mdegans · July 29, 2019, 6:53pm

So i looked up some of my code doing something similar. It’s untested and there are no guarantees it works, or is fast, but you get the idea. There is a cudaToNumpy function in jetson.utils that you might be able to use.

def simple_cpu_image_hash(image, size=8, channels=4, alpha_black=True) -> bytes:
    """
    adapted from:

    http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html

    :arg image: Tuple returned from jetson.utils.loadImageRGBA or similar
    :param size: scales to this size*size before returning the bytes
        (larger values will yield longer hashes an be more sensitive to change)
    :param channels: number of input channels
    :param alpha_black: composites images with alpha channel on black (false
    composites to white)

    :returns: a byte string representing a hash of the image.

    features of this hash:

    - different sizes of an image will likely have identical hashes
    - The location of the change in the returned bytestring will indicate the
      location of change in the hash because...
    - The hash is just a (very) scaled down, greyscale, image representation as
      bytes.
    """

    background = np.zeros if alpha_black else np.ones

    # get a numpy array with the array in shared gpu/cpu memory
    # there is no copy involved in creation of the numpy array
    in_arr = jetson.utils.cudaToNumpy(*image, channels)  # type: np.ndarray

    # flatten rgba image on black background
    out_arr = skimage.color.rgba2rgb(in_arr,  # in image over
                                     background(shape=3, dtype=np.uint8)),

    # convert to greyscale
    out_arr = skimage.color.rgb2gray(out_arr)

    # resize to size * size using linear interpolation, aa off since pointless
    out_arr = skimage.transform.resize(
        out_arr, (size, size), anti_aliasing=False)

    return out_arr.tobytes()  # return the 'image' as raw bytes

Edit: fix: made channels a parameter

marconi.k · July 30, 2019, 7:31am

Hi,

Using the jetson.utils.cudaToNumpy function giving me the error " failed to get input array pointer from PyCapsule container ". Will continue to investigate, thank you.

marconi.k · July 30, 2019, 7:56am

PS : Just so everybody know, I’m working with python

mdegans · July 30, 2019, 1:40pm

I am not 100% certain about the error, but it is possible the camera is not being passed a tuple rather than the pycapsule.

Iirc, the camera returns a tuple (a container) of PyCapsule and two integers representing the dimensions. You can unpack a tuple (or any sequence type) into it’s components with the * operator in python. That’s what’s going on in my code above where it’s

image = camera.CaptureRGBA()
...
in_arr = jetson.utils.cudaToNumpy(*image, 4)  # type: np.ndarray

It’s more or less shorthand for:

capsule, x, y = camera.CaptureRGBA()
in_arr = jetson.utils.cudaToNumpy(capsule, x, y, 4)  # type: np.ndarray

The extra 4 is to specify the dimensions. It’s 4 from the RGBA camera because it’s 4 channels.

The “-> bytes” in the function definition may not look like python, but it is. Recent versions of python allow specifying a return type like that. It’s not actually checked at runtime, however. It’s just hinting for the IDE and there are other ways of doing that (like a #type: foo) comment at the return line or in the docstring.

here is another example from NVIDIA

marconi.k · July 30, 2019, 2:04pm

Wasn’t sure it was python, thanks ^^ Trying your method giving me this : cudaToNumpy() argument after * must be an iterable, not PyCapsule. This object is so strange ! Thanks for your help and your time :) Will keep updated if anything moves

mdegans · July 30, 2019, 2:10pm

It looks like you passed the right object then. Perhaps the number of channels are incorrect? RGBA would be 4, RGB or bgr would be 3, etc.

marconi.k · July 30, 2019, 2:20pm

The thing is that i do not find any workarround with the cudaToNumpy method… If i’m giving it as argument (img,width,height,4), i’m getting " failed to get input array pointer from PyCapsule container " and if i’m giving it (*img,4) I’m getting “cudaToNumpy() argument after * must be an iterable, not PyCapsule” . I tried with other number of channels already but nothing new happened.

mdegans · July 30, 2019, 2:27pm

So, in C, a pointer is just a number that points to a location in memory. To avoid a copy of the memory itself, which is expensive, you can pass this number around instead.

When it needs to be passed through Python, it needs to go in a Python object because that’s what everything is in Python. PyCapsule is one way to do that.

You can’t access the image directly because what’s in the capsule is just this magic number. Some C function with a python wrapper like cudaToNumpy needs to take the capsule, do some surgery on it, and put the pointer (magic number) in the numpy array. No copy happens. Just a little surgery, and you end up with a np.ndarray you can do what you want with.

marconi.k · July 30, 2019, 2:31pm

Thanks for your explanation. My work is to find how to do this surgery ^^

mdegans · July 30, 2019, 3:02pm

You shouldn’t need to do any, hopefully :) If you can cudaToNumpy to work, it will do it for you. My explanation was more to explain why it’s not possible to access the image pixels directly from python without using that function. Sorry if I wasn’t clear. With that function you should be able to read and write to the image array like any other ndarray.

marconi.k · July 31, 2019, 7:46am

Hi, this is my code for the moment. Tell me if you see something wrong, will keep updated.

PS: Sorry if I’m not working in the same hours as yours but I’m French ^^

import jetson.inference

import jetson.utils


import argparse

import cv2

import time

import ctypes

import numpy as np

# parse the command 
line
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
						formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage())


parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load, see below for options")

parser.add_argument("--threshold", type=float, default=0.42, help="minimum detection threshold to use")

parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (NULL for CSI camera 0)\nor for VL42 cameras the /dev/video node to use.\nby default, MIPI CSI camera 0 will be used.")

parser.add_argument("--width", type=int, default=960, help="desired width of camera stream (default is 1280 pixels)")

parser.add_argument("--height", type=int, default=616, help="desired height of camera stream (default is 720 pixels)")

opt, argv = parser.parse_known_args()



# load the object detection network

net = jetson.inference.detectNet(opt.network, argv, opt.threshold)



# create the camera and display

camera = jetson.utils.gstCamera(opt.width, opt.height, opt.camera)

display = jetson.utils.glDisplay()


nb_person_sas=0

def simple_cpu_image_hash(image,size=8,alpha_black=True) -> bytes:
	background = np.zeros if alpha_black else np.ones

	in_arr=jetson.utils.cudaToNumpy(*image,4)

	out_arr=skimage.color.rgba2rgb(in_arr,background(shape=3,dtype=np.uint8))

	out_arr=skimage.color.rgb2gray(out_arr)

	out_arr=skimage.transform.resize(out_arr,(size,size),anti_aliasing=False)

	return out_arr.tobytes()

while True:#.IsOpen():
 # process frames until user exits


	# capture the image

	img= camera.CaptureRGBA()
	
	out_test=simple_cpu_image_hash(img,8,True)
	
	print(out_test)

	tick=time.time()
	
	detections = net.Detect(img, opt.width, opt.height)

	tock=time.time()-tick
	
	# print the detections

	#print("detected {:d} objects in image".format(len(detections)))




	# render the image

	display.RenderOnce(img, width, height)


	# update the title bar

	display.SetTitle("{:s} | Network {:.0f} FPS".format("POC SAVARI", 1000.0 / net.GetNetworkTime()))

Ravik · August 7, 2019, 6:05pm

Can we do this with RTSP stream?

mdegans · August 9, 2019, 3:32pm

Hi, this is my code for the moment. Tell me if you see something wrong, will keep updated.

PS: Sorry if I’m not working in the same hours as yours but I’m French ^^

import jetson.inference

import jetson.utils

import argparse

import cv2

import time

import ctypes

import numpy as np

# parse the command 
line
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
						formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage())

parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load, see below for options")

parser.add_argument("--threshold", type=float, default=0.42, help="minimum detection threshold to use")

parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (NULL for CSI camera 0)\nor for VL42 cameras the /dev/video node to use.\nby default, MIPI CSI camera 0 will be used.")

parser.add_argument("--width", type=int, default=960, help="desired width of camera stream (default is 1280 pixels)")

parser.add_argument("--height", type=int, default=616, help="desired height of camera stream (default is 720 pixels)")

opt, argv = parser.parse_known_args()

# load the object detection network

net = jetson.inference.detectNet(opt.network, argv, opt.threshold)

# create the camera and display

camera = jetson.utils.gstCamera(opt.width, opt.height, opt.camera)

display = jetson.utils.glDisplay()

nb_person_sas=0

def simple_cpu_image_hash(image,size=8,alpha_black=True) -> bytes:
	background = np.zeros if alpha_black else np.ones

	in_arr=jetson.utils.cudaToNumpy(*image,4)

	out_arr=skimage.color.rgba2rgb(in_arr,background(shape=3,dtype=np.uint8))

	out_arr=skimage.color.rgb2gray(out_arr)

	out_arr=skimage.transform.resize(out_arr,(size,size),anti_aliasing=False)

	return out_arr.tobytes()

while True:#.IsOpen():
 # process frames until user exits

	# capture the image

	img= camera.CaptureRGBA()
	
	out_test=simple_cpu_image_hash(img,8,True)
	
	print(out_test)

	tick=time.time()
	
	detections = net.Detect(img, opt.width, opt.height)

	tock=time.time()-tick
	
	# print the detections

	#print("detected {:d} objects in image".format(len(detections)))

	# render the image

	display.RenderOnce(img, width, height)

	# update the title bar

	display.SetTitle("{:s} | Network {:.0f} FPS".format("POC SAVARI", 1000.0 / net.GetNetworkTime()))

Looks mostly correct except line 16, a lack of an skimage import, and the last part should look like this:

# capture the image

    img = camera.CaptureRGBA()

    out_test = simple_cpu_image_hash(img, 8, True)

...

    detections = net.Detect(*img)

...

    display.RenderOnce(*img)

Img is actually a Tuple[PyCapsule, int, int] (that’s typing notation). In simpler terms, it’s a tuple (an immutable sequence) of a pointer to the image, the width, and the height, so you need to unpack them into a function that has the same signature (image, width, height).

img = camera.CaptureRGBA()

… rather than …

img, width, height = camera.CaptureRGBA()

Is just to pass around three things conveniently in one container. Since the image, width, and height are logically related and all functions that use them have the same signature, it makes sense to pass them around this way. The same code could be written like:

# capture the image

    capsule, width, height = camera.CaptureRGBA()

    out_test = simple_cpu_image_hash((capsule, width, height), 8, True)

...

    detections = net.Detect(capsule, width, height)

...

    display.RenderOnce(capsule, width, height)

Also, you don’t really have to, but if you do use the image hashing function in your code, please preserve the docstrings, comments, and where you found it :)

mdegans · August 9, 2019, 3:47pm

Do you mean use the Jetson libraries with an rtsp stream? I suppose you could but it’s probably easier to use DeepStream for that.

Topic		Replies	Views
record and run inference at the same time, split video Jetson Nano	26	2808	October 14, 2021
Jetson Nano - Limiting the results shown by the DetectNet example. Jetson Nano	8	2308	October 14, 2021
Is there any demos available for python jetson inference Jetson Nano	82	11681	September 30, 2019
Design and architecture guidance Jetson Nano camera	45	2205	October 15, 2021
How does one encode and stream detectnet object detection video over network using the dusty-jetson-inference repo? Jetson TX2	15	3862	October 18, 2021
Free space detection using jetson inference segmentation Jetson AGX Xavier	29	1631	October 18, 2021
Hello AI World - new object detection training and video interfaces Jetson Nano	29	4492	April 20, 2021
Object detection on multipole streams Jetson Xavier NX	15	835	September 20, 2023
X264 and TensorRT sudden reboot (MJPG encoder not affected, but not fast enough) on Jetson Orin Nano Jetson Orin Nano tensorrt , jetson-inference , gstreamer , jetson	52	830	June 18, 2024
Python: cudaImage <-> OpenCV conversions very slow Jetson Nano cuda	11	723	January 25, 2023

What is PyCapsule objects in Jetson-Inference Python scripts ?

Related topics