What is PyCapsule objects in Jetson-Inference Python scripts ?

Hi everyone,

I’ve started to work with the repo of Dusty-nv (https://github.com/dusty-nv/jetson-inference) to try to use the python scripts with SSD-mobilenet-V2. I’m using the “detectnet-camera.py” script and I would like to work with the images from my CSI camera (cropping them for example) but their type is “PyCapsule object” and I can’t find anything relevant on it on google. I’m used to work with OpenCV frames (UMat frame) which can easily be cropped but I can’t find any solution to use these PyCapsule things. Does anyone know this type of object ?


Kévin Marconi

Hi Kevin, PyCapsule objects are used to pass pointers to memory without incurring memory copies in Python or extra overhead (in this case, CUDA memory).

See the cudaToNumpy and cudaFromNumpy scripts to convert the PyCapsule to an ndarray.

Thanks for the reply, will take a look at it :)


I’ve tried to use the cudaToNumpy script in the detectnet-camera one. So first, I’m creating the CudaAllocation memory with the width,height and depth of my image, then I create an array with it. It is giving me an array full of 0 so I’m sure I’m doing something wrong ^^ How do I fill this array with the pixels from my pycapsule image ?


Kévin Marconi

If you just allocated the CUDA memory, it would still be blank and filled with zeros.

It sounds like what you want to do is convert your CvMat to numpy ndarray, and then call cudaFromNumpy()

What I want to do is to crop the img from the captureRGBA method but this in an Pycapsule object and I don’t know how to use it. The only way I know to crop img is when they are numpy.ndarray type so i’m looking to convert this pycapsule img into a numpy array :) Hope it is more clear for you.

So i looked up some of my code doing something similar. It’s untested and there are no guarantees it works, or is fast, but you get the idea. There is a cudaToNumpy function in jetson.utils that you might be able to use.

def simple_cpu_image_hash(image, size=8, channels=4, alpha_black=True) -> bytes:
    adapted from:


    :arg image: Tuple returned from jetson.utils.loadImageRGBA or similar
    :param size: scales to this size*size before returning the bytes
        (larger values will yield longer hashes an be more sensitive to change)
    :param channels: number of input channels
    :param alpha_black: composites images with alpha channel on black (false
    composites to white)

    :returns: a byte string representing a hash of the image.

    features of this hash:

    - different sizes of an image will likely have identical hashes
    - The location of the change in the returned bytestring will indicate the
      location of change in the hash because...
    - The hash is just a (very) scaled down, greyscale, image representation as

    background = np.zeros if alpha_black else np.ones

    # get a numpy array with the array in shared gpu/cpu memory
    # there is no copy involved in creation of the numpy array
    in_arr = jetson.utils.cudaToNumpy(*image, channels)  # type: np.ndarray

    # flatten rgba image on black background
    out_arr = skimage.color.rgba2rgb(in_arr,  # in image over
                                     background(shape=3, dtype=np.uint8)),

    # convert to greyscale
    out_arr = skimage.color.rgb2gray(out_arr)

    # resize to size * size using linear interpolation, aa off since pointless
    out_arr = skimage.transform.resize(
        out_arr, (size, size), anti_aliasing=False)

    return out_arr.tobytes()  # return the 'image' as raw bytes

Edit: fix: made channels a parameter


Using the jetson.utils.cudaToNumpy function giving me the error " failed to get input array pointer from PyCapsule container ". Will continue to investigate, thank you.

PS : Just so everybody know, I’m working with python

I am not 100% certain about the error, but it is possible the camera is not being passed a tuple rather than the pycapsule.

Iirc, the camera returns a tuple (a container) of PyCapsule and two integers representing the dimensions. You can unpack a tuple (or any sequence type) into it’s components with the * operator in python. That’s what’s going on in my code above where it’s

image = camera.CaptureRGBA()
in_arr = jetson.utils.cudaToNumpy(*image, 4)  # type: np.ndarray

It’s more or less shorthand for:

capsule, x, y = camera.CaptureRGBA()
in_arr = jetson.utils.cudaToNumpy(capsule, x, y, 4)  # type: np.ndarray

The extra 4 is to specify the dimensions. It’s 4 from the RGBA camera because it’s 4 channels.

The “-> bytes” in the function definition may not look like python, but it is. Recent versions of python allow specifying a return type like that. It’s not actually checked at runtime, however. It’s just hinting for the IDE and there are other ways of doing that (like a #type: foo) comment at the return line or in the docstring.

here is another example from NVIDIA

Wasn’t sure it was python, thanks ^^ Trying your method giving me this : cudaToNumpy() argument after * must be an iterable, not PyCapsule. This object is so strange ! Thanks for your help and your time :) Will keep updated if anything moves

It looks like you passed the right object then. Perhaps the number of channels are incorrect? RGBA would be 4, RGB or bgr would be 3, etc.

The thing is that i do not find any workarround with the cudaToNumpy method… If i’m giving it as argument (img,width,height,4), i’m getting " failed to get input array pointer from PyCapsule container " and if i’m giving it (*img,4) I’m getting “cudaToNumpy() argument after * must be an iterable, not PyCapsule” . I tried with other number of channels already but nothing new happened.

So, in C, a pointer is just a number that points to a location in memory. To avoid a copy of the memory itself, which is expensive, you can pass this number around instead.

When it needs to be passed through Python, it needs to go in a Python object because that’s what everything is in Python. PyCapsule is one way to do that.

You can’t access the image directly because what’s in the capsule is just this magic number. Some C function with a python wrapper like cudaToNumpy needs to take the capsule, do some surgery on it, and put the pointer (magic number) in the numpy array. No copy happens. Just a little surgery, and you end up with a np.ndarray you can do what you want with.

Thanks for your explanation. My work is to find how to do this surgery ^^

You shouldn’t need to do any, hopefully :) If you can cudaToNumpy to work, it will do it for you. My explanation was more to explain why it’s not possible to access the image pixels directly from python without using that function. Sorry if I wasn’t clear. With that function you should be able to read and write to the image array like any other ndarray.

Hi, this is my code for the moment. Tell me if you see something wrong, will keep updated.

PS: Sorry if I’m not working in the same hours as yours but I’m French ^^

import jetson.inference

import jetson.utils

import argparse

import cv2

import time

import ctypes

import numpy as np

# parse the command 
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
						formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage())

parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load, see below for options")

parser.add_argument("--threshold", type=float, default=0.42, help="minimum detection threshold to use")

parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (NULL for CSI camera 0)\nor for VL42 cameras the /dev/video node to use.\nby default, MIPI CSI camera 0 will be used.")

parser.add_argument("--width", type=int, default=960, help="desired width of camera stream (default is 1280 pixels)")

parser.add_argument("--height", type=int, default=616, help="desired height of camera stream (default is 720 pixels)")

opt, argv = parser.parse_known_args()

# load the object detection network

net = jetson.inference.detectNet(opt.network, argv, opt.threshold)

# create the camera and display

camera = jetson.utils.gstCamera(opt.width, opt.height, opt.camera)

display = jetson.utils.glDisplay()


def simple_cpu_image_hash(image,size=8,alpha_black=True) -> bytes:
	background = np.zeros if alpha_black else np.ones





	return out_arr.tobytes()

while True:#.IsOpen():
 # process frames until user exits

	# capture the image

	img= camera.CaptureRGBA()

	detections = net.Detect(img, opt.width, opt.height)

	# print the detections

	#print("detected {:d} objects in image".format(len(detections)))

	# render the image

	display.RenderOnce(img, width, height)

	# update the title bar

	display.SetTitle("{:s} | Network {:.0f} FPS".format("POC SAVARI", 1000.0 / net.GetNetworkTime()))

Can we do this with RTSP stream?

Looks mostly correct except line 16, a lack of an skimage import, and the last part should look like this:

# capture the image

    img = camera.CaptureRGBA()

    out_test = simple_cpu_image_hash(img, 8, True)


    detections = net.Detect(*img)



Img is actually a Tuple[PyCapsule, int, int] (that’s typing notation). In simpler terms, it’s a tuple (an immutable sequence) of a pointer to the image, the width, and the height, so you need to unpack them into a function that has the same signature (image, width, height).

img = camera.CaptureRGBA()

… rather than …

img, width, height = camera.CaptureRGBA()

Is just to pass around three things conveniently in one container. Since the image, width, and height are logically related and all functions that use them have the same signature, it makes sense to pass them around this way. The same code could be written like:

# capture the image

    capsule, width, height = camera.CaptureRGBA()

    out_test = simple_cpu_image_hash((capsule, width, height), 8, True)


    detections = net.Detect(capsule, width, height)


    display.RenderOnce(capsule, width, height)

Also, you don’t really have to, but if you do use the image hashing function in your code, please preserve the docstrings, comments, and where you found it :)

Do you mean use the Jetson libraries with an rtsp stream? I suppose you could but it’s probably easier to use DeepStream for that.