Hello, we have a main thread loop which is pulling in images, then with VPI - uploading them to GPU, performing an image warp, then locking to CPU and sending to our endpoint
frame1 = vpi.asimage(timestamp_img)
frame1 = frame1.perspwarp(hom)
with frame1.rlock_cpu() as data:
This main thread creates a child thread which uses Jetson Inference object:
net = detectNet(
(get image somehow)
cuda_mem = jetson_utils.cudaFromNumpy(img)
detections = net.Detect(cuda_mem)
which also runs in a loop performing detections, but too slow to put in main thread as we need 30fps
My question is how do I share the input images from the main thread with the child thread? The images are 2 * 4K so I am hoping for something performant, ideally I can share/copy the VPI GPU object like a pointer and pass it with a python queue. If I pass copied images in a queue its very slow, and if I use shared memory it seems hacky
Any help appreciated
The particulars of our application require this parallelism, as the output to the user must be 30fps and not limited by the inference - which will be lower FPS due to large images
Hi @liellplane, please refer to this other recent topic and my suggestion to use a cudaImage mapping as the output of VPI, so that the data is already in the cudaImage:
The detectNet is going to downsample your images to 300x300 for inference anyways, so I might recommend downsampling them to a lower resolution than 4K on the VPI side before you even copy them to the inference thread.
Yes, I have done basic Python multithreading with it. It does get more complicated to manage if/when you start doing CUDA processing from multiple threads/streams and the synchronization. The cudaImage’s are typically allocated as “mapped” memory so they can be accessed from the CPU too.