I’m working on porting a project from a different platform (Raspberry Pi + Intel OpenVINO) to a Jetson Nano platform. The gist of the project is I need to read an image from a camera or video, do some pre-processing on it, and then run the image through one or more neural networks for things like face detection, pose detection, etc. The pre-processing consists of taking the original image and rotating it some amount several times and then building a composite image of all of the rotations, which is then passed to the neural network(s).
I’m new to the Nvidia/Jetson API, so I’ve started with the dusty-nv Hello-AI samples (posenet.py for example). The image captured from the input is in cudaImage format, and working with that is much faster than a standard OpenCV image on the Pi (obviously due to GPU acceleration), but I haven’t been able to find documentation on doing a lot of the operations I need on this format. For example, I have not found any python example that shows how to do a rotate operation on a cudaImage. To get around this I converted the cudaImage to an OpenCV image (by converting it to a numpy array and adjusting BRG->RGB), but when profiling my code it turns out that converting a cudaImage->OpenCV is fairly slow (20-30ms) and converting the OpenCV image back to cudaImage for processing by the posenet class is even slower (40-60ms). That conversion time ends up negating any performance benefits that the GPU acceleration of the Jetson provides.
Is there a better way to do this? Is there some document that explains how to manipulate cudaImage classes beyond just resizing and such? Should I not be using the Hello-AI classes and a guide to port my non-NVIDIA codebase?
Sorry for the likely dumb questions, but I’m completely new to the NVIDIA ecosystem and it’s a little overwhelming to just jump right in.