Shared memory parallel processing for jetson inference

daniel181 · July 25, 2022, 10:22pm

Hello,

I am working on a Jetson Nano in Jetpack 4.5.1

I am working in python 3.6 and attempting to create a producer/consumer system to efficiently perform inference and other operations in images. I am hoping to use a producer process and consumer processes that use shared memory to make use of multiple cores to encode jpeg binary, perform object detection and more in parallel.

Basically the goal is a camera at 30fps and multiple processes pulling from that camera simultaneously. I’m not worried about each process fetching every frame but rather being able to have different processes pull the most recent frame whenever they are ready.

EX: Camera pulling frames at 30fps
→ Object detection at 20fps
→ filtering the image and encoding a jpeg binary to send elsewhere at 28fps
→ saving images locally at 5fps

I am using numpy, and the python multiprocessing model but am running into very strange issues that I suspect are from messing with cuda zero copy memory. Could anyone point me to better tools to distribute a video source to multiple processes for performance.

Thanks

dusty_nv · July 26, 2022, 1:53am

Hi @daniel181, CUDA memory isn’t shared across processes, so the threads would need to be intra-process.

The most efficient model would ideally to perform your image processing operations on the GPU and just queue them in a pipeline with the inferencing. There’s typically not a need for CPU multithreading with that.

daniel181 · July 26, 2022, 2:22pm

Sounds good. Do you mind pointing me in the direction of where to learn how to queue a GPU pipeline?

dusty_nv · July 26, 2022, 3:03pm

CUDA kernels are inherently executed asynchronously. It isn’t until you perform a synchronization on a CUDA stream or with cudaDeviceSynchronize() that synchronization occurs.

If you are using numpy for your operations today, you may want to look into cupy, which is like the CUDA version of numpy.

system · August 17, 2022, 4:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best hardware options to reduce GPU and CPU memory transfer time? Jetson Nano	6	1025	January 19, 2022
Help please! - Image prcoessing in Jetson Jetson Nano cuda	5	341	September 5, 2022
Optimising GPU and CPU memory transfer time (CUDA/Hardware)? CUDA Programming and Performance hw , cuda	8	3771	January 7, 2022
CUDA Persistent Thread on Jetson Nano Jetson Nano cuda	6	27	August 21, 2024
Cudamemcpy Jetson Nano cuda	2	1201	March 9, 2022
How do I receive camera data with GPU using Jetson nano? Jetson Nano camera , cuda	2	465	August 24, 2022
How to manage CUDA memory? Jetson Xavier NX cuda , python	4	649	December 28, 2022
Zero-copy still copy data? Jetson AGX Xavier	7	3630	October 18, 2021
Run Multiple Computer Vision Cameras on one Jetson Nano board Jetson Nano camera , opencv , cuda	2	649	January 30, 2023
How to share GPU buffer between VPI, and Jetson Inference Detect in a child thread? Jetson Xavier NX vpi	4	918	April 21, 2023

Shared memory parallel processing for jetson inference

Related topics