jetson-inference python multiprocessing

Try to reduce the conversion time with multiprocessing. But I got this:

If I try to put to a cuda memory frame to a multiprocessing.Queue() (self.q_numpy_frame) in a separate process, then I get “TypeError: can’t pickle PyCapsule objects”

color_image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
    cuda_frame = jetson_utils_python.cudaFromNumpy(color_image)
    self.q_numpy_frame.put((cuda_frame, width, height))

The trackeback:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: can't pickle PyCapsule objects

Do you have a clue why? What is the workaround? Thanks.

Hi benkelaci, it doesn’t seem like Python multiprocessing package supports PyCapsule objects. Instead, you might want to use Python thread module to spawn threads and pass around the object returned from cudaFromNumpy().

For more info, see this related issue on GitHub: https://github.com/dusty-nv/jetson-inference/issues/366

I don’t think what you have there can go any faster in pure Python. Even if you use threading and spawn multiple os threads, thanks to python’s GIL, only one thread will run at a time. Calls to C code like cv2.CvtColor are usually no exception unless they’re writtin a very specific way:

https://stackoverflow.com/questions/42006337/python-c-api-is-it-thread-safe

https://opensource.com/article/17/4/grok-gil

Yes, mdegans. I agree because of the GIL it will be run with the same speed.

So do you think I should use a C module instead of current python code, that will not be block by the GIL?

Yeah, I tried it with threading. Similar speed as with multiprocessing (in case where the conversion and inference in the same process)

If you can change the PyCapsule object to any other pickle-compatible one, then we can reach better speed (around 20 FPS) than current one (12 FPS). Possible?

Other idea: I am using a webcamera and I use openCV VideoCapture(), that is why I need to use cudaFromNumpy() function. Is there a good, faster alternative for this approach?

I don’t think that’s possible. Will explain later. On mobile atm.

Edit: I don’t think it’s possible because cudaFromNumpy(color_image) returns an encapsulated pointer inside it’s PyCapsule iirc and even if you could pickle and send that address around, it wouldn’t be very useful to another process.

Yes, if you want to use gstreamer. I believe gstreamer has some built in elements that will let you connect to a webcam source. For example:

https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-good/html/gst-plugins-good-plugins-rtspsrc.html

Then you just built a pipeline with that element at the beginning. No matter what you’re going to have to convert into the format required by the rest of your pipeline (" … ! nvvidconv ! 'video/x-raw(memory:NVMM) ! … ").

You can find some examples from nvidia you can adapt in their accelerated gstreamer user’s guide,

https://developer.download.nvidia.com/embedded/L4T/r32_Release_v1.0/Docs/Accelerated_GStreamer_User_Guide.pdf

Although I would recommend also doing the Gstreamer tutorials in C, even if you don’t know C, since gstreamer looks like C in any language (so you might as well just use C, or at least know how to do it in C).

https://gstreamer.freedesktop.org/documentation/tutorials/basic/hello-world.html?gi-language=c

Edit: I forgot to mention that gstreamer has native support for threading and queues, and much of it is handled for you.

https://gstreamer.freedesktop.org/documentation/tutorials/basic/multithreading-and-pad-availability.html?gi-language=c

Why is not that possible?

My problem with gstreamer is that on VNC gives failed. Can you give/show me a workable example with gstreamer?

I edited my post above to explain why.

I’m not sure what you mean by “on vnc gives failed”. Can you elaborate?

Using Nvidia’s DeepStream components for gstreamer, which will be coming to nano soon (hopefully).

It would be something like “rtspsrc ! (so some parsing, decoding, and conversion here) ! nvinfer model-file=birds.caffemodel (more options here) ! … ! somesink”

The documentation is here:

https://developer.download.nvidia.com/compute/machine-learning/deepstream/secure/3.0/Jetson/DeepStream_3.0_Plugin_Manual_Xavier.pdf

For xavier but i am assuming much will be the same for Nano. You can get an idea of the pipeline elements provided and what you’ll be able to do.

I found a post here dealing with an rtsp source, and off that thread more examples are linked. The inference won’t work on Nano this second, but you should be able to build your pipeline and have it ready for when DeepStream 4 is released for nano. For example, “gst-launch-1.0 rtspsrc location=rtsp://admin:pass@192.168.30.61/Streaming/Channels/102/ ! rtph264depay ! h264parse ! omxh264dec ! nvoverlaysink” should work on the nano right now to decode and playback an rtsp stream (example from tha thread).