I’ve started to work with the repo of Dusty-nv (GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.) to try to use the python scripts with SSD-mobilenet-V2. I’m using the “detectnet-camera.py” script and I would like to work with the images from my CSI camera (cropping them for example) but their type is “PyCapsule object” and I can’t find anything relevant on it on google. I’m used to work with OpenCV frames (UMat frame) which can easily be cropped but I can’t find any solution to use these PyCapsule things. Does anyone know this type of object ?
Hi Kevin, PyCapsule objects are used to pass pointers to memory without incurring memory copies in Python or extra overhead (in this case, CUDA memory).
See the cudaToNumpy and cudaFromNumpy scripts to convert the PyCapsule to an ndarray.
I’ve tried to use the cudaToNumpy script in the detectnet-camera one. So first, I’m creating the CudaAllocation memory with the width,height and depth of my image, then I create an array with it. It is giving me an array full of 0 so I’m sure I’m doing something wrong ^^ How do I fill this array with the pixels from my pycapsule image ?
What I want to do is to crop the img from the captureRGBA method but this in an Pycapsule object and I don’t know how to use it. The only way I know to crop img is when they are numpy.ndarray type so i’m looking to convert this pycapsule img into a numpy array :) Hope it is more clear for you.
So i looked up some of my code doing something similar. It’s untested and there are no guarantees it works, or is fast, but you get the idea. There is a cudaToNumpy function in jetson.utils that you might be able to use.
def simple_cpu_image_hash(image, size=8, channels=4, alpha_black=True) -> bytes:
"""
adapted from:
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
:arg image: Tuple returned from jetson.utils.loadImageRGBA or similar
:param size: scales to this size*size before returning the bytes
(larger values will yield longer hashes an be more sensitive to change)
:param channels: number of input channels
:param alpha_black: composites images with alpha channel on black (false
composites to white)
:returns: a byte string representing a hash of the image.
features of this hash:
- different sizes of an image will likely have identical hashes
- The location of the change in the returned bytestring will indicate the
location of change in the hash because...
- The hash is just a (very) scaled down, greyscale, image representation as
bytes.
"""
background = np.zeros if alpha_black else np.ones
# get a numpy array with the array in shared gpu/cpu memory
# there is no copy involved in creation of the numpy array
in_arr = jetson.utils.cudaToNumpy(*image, channels) # type: np.ndarray
# flatten rgba image on black background
out_arr = skimage.color.rgba2rgb(in_arr, # in image over
background(shape=3, dtype=np.uint8)),
# convert to greyscale
out_arr = skimage.color.rgb2gray(out_arr)
# resize to size * size using linear interpolation, aa off since pointless
out_arr = skimage.transform.resize(
out_arr, (size, size), anti_aliasing=False)
return out_arr.tobytes() # return the 'image' as raw bytes
Using the jetson.utils.cudaToNumpy function giving me the error " failed to get input array pointer from PyCapsule container ". Will continue to investigate, thank you.
I am not 100% certain about the error, but it is possible the camera is not being passed a tuple rather than the pycapsule.
Iirc, the camera returns a tuple (a container) of PyCapsule and two integers representing the dimensions. You can unpack a tuple (or any sequence type) into it’s components with the * operator in python. That’s what’s going on in my code above where it’s
capsule, x, y = camera.CaptureRGBA()
in_arr = jetson.utils.cudaToNumpy(capsule, x, y, 4) # type: np.ndarray
The extra 4 is to specify the dimensions. It’s 4 from the RGBA camera because it’s 4 channels.
The “-> bytes” in the function definition may not look like python, but it is. Recent versions of python allow specifying a return type like that. It’s not actually checked at runtime, however. It’s just hinting for the IDE and there are other ways of doing that (like a #type: foo) comment at the return line or in the docstring.
Wasn’t sure it was python, thanks ^^ Trying your method giving me this : cudaToNumpy() argument after * must be an iterable, not PyCapsule. This object is so strange ! Thanks for your help and your time :) Will keep updated if anything moves
The thing is that i do not find any workarround with the cudaToNumpy method… If i’m giving it as argument (img,width,height,4), i’m getting " failed to get input array pointer from PyCapsule container " and if i’m giving it (*img,4) I’m getting “cudaToNumpy() argument after * must be an iterable, not PyCapsule” . I tried with other number of channels already but nothing new happened.
So, in C, a pointer is just a number that points to a location in memory. To avoid a copy of the memory itself, which is expensive, you can pass this number around instead.
When it needs to be passed through Python, it needs to go in a Python object because that’s what everything is in Python. PyCapsule is one way to do that.
You can’t access the image directly because what’s in the capsule is just this magic number. Some C function with a python wrapper like cudaToNumpy needs to take the capsule, do some surgery on it, and put the pointer (magic number) in the numpy array. No copy happens. Just a little surgery, and you end up with a np.ndarray you can do what you want with.
You shouldn’t need to do any, hopefully :) If you can cudaToNumpy to work, it will do it for you. My explanation was more to explain why it’s not possible to access the image pixels directly from python without using that function. Sorry if I wasn’t clear. With that function you should be able to read and write to the image array like any other ndarray.
Hi, this is my code for the moment. Tell me if you see something wrong, will keep updated.
PS: Sorry if I’m not working in the same hours as yours but I’m French ^^
import jetson.inference
import jetson.utils
import argparse
import cv2
import time
import ctypes
import numpy as np
# parse the command
line
parser = argparse.ArgumentParser(description="Locate objects in a live camera stream using an object detection DNN.",
formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.detectNet.Usage())
parser.add_argument("--network", type=str, default="ssd-mobilenet-v2", help="pre-trained model to load, see below for options")
parser.add_argument("--threshold", type=float, default=0.42, help="minimum detection threshold to use")
parser.add_argument("--camera", type=str, default="0", help="index of the MIPI CSI camera to use (NULL for CSI camera 0)\nor for VL42 cameras the /dev/video node to use.\nby default, MIPI CSI camera 0 will be used.")
parser.add_argument("--width", type=int, default=960, help="desired width of camera stream (default is 1280 pixels)")
parser.add_argument("--height", type=int, default=616, help="desired height of camera stream (default is 720 pixels)")
opt, argv = parser.parse_known_args()
# load the object detection network
net = jetson.inference.detectNet(opt.network, argv, opt.threshold)
# create the camera and display
camera = jetson.utils.gstCamera(opt.width, opt.height, opt.camera)
display = jetson.utils.glDisplay()
nb_person_sas=0
def simple_cpu_image_hash(image,size=8,alpha_black=True) -> bytes:
background = np.zeros if alpha_black else np.ones
in_arr=jetson.utils.cudaToNumpy(*image,4)
out_arr=skimage.color.rgba2rgb(in_arr,background(shape=3,dtype=np.uint8))
out_arr=skimage.color.rgb2gray(out_arr)
out_arr=skimage.transform.resize(out_arr,(size,size),anti_aliasing=False)
return out_arr.tobytes()
while True:#.IsOpen():
# process frames until user exits
# capture the image
img= camera.CaptureRGBA()
out_test=simple_cpu_image_hash(img,8,True)
print(out_test)
tick=time.time()
detections = net.Detect(img, opt.width, opt.height)
tock=time.time()-tick
# print the detections
#print("detected {:d} objects in image".format(len(detections)))
# render the image
display.RenderOnce(img, width, height)
# update the title bar
display.SetTitle("{:s} | Network {:.0f} FPS".format("POC SAVARI", 1000.0 / net.GetNetworkTime()))
Img is actually a Tuple[PyCapsule, int, int] (that’s typing notation). In simpler terms, it’s a tuple (an immutable sequence) of a pointer to the image, the width, and the height, so you need to unpack them into a function that has the same signature (image, width, height).
img = camera.CaptureRGBA()
… rather than …
img, width, height = camera.CaptureRGBA()
Is just to pass around three things conveniently in one container. Since the image, width, and height are logically related and all functions that use them have the same signature, it makes sense to pass them around this way. The same code could be written like:
Also, you don’t really have to, but if you do use the image hashing function in your code, please preserve the docstrings, comments, and where you found it :)