I am using computer vision with trt_pose from here GitHub - NVIDIA-AI-IOT/trt_pose: Real-time pose estimation accelerated with NVIDIA TensorRT running on a Jetson Xavier NX as standalone on the edge on a robot. The robot has some hobby servos and dynamixel servos that the jetson controls to move the robot based upon the output from the trt_pose model.
The main loop basically can be broken in to 3 parts:
- grab the frame from gstreamer and preprocess the frame ready for the model.
- the model processes the frame (inference)
- post processing of the model’s output to move the robot in the real world.
I have timed each part and get:
part 1 is 0.006 sec, part 2 is 0.017 sec and part 3 is 0.022
Part 3 is the slowest part of the loop yet computationally is the lightest. Timing each step of part 3 I find sending the data (a dozen bytes) to the dynamixel servos via the USB port at a baud rate of 1Mbp is what is holding up the process. I assume it is python or the OS is holding it up here and everything is sitting idle waiting on this process.
Part 1 and part 3 only run on the CPU on a single core and Part 2 runs on all 6 CPU core and all GPU cores.
What I would like to do in python is run Part 3 in parallel with part 1 and part 2, i.e once part 2 finishes it goes back to part 1 and part 3 runs in parallel. What’s the best way to achieve this???