I have a program I’ve coded in python on the Jetson Nano (Ubuntu 18.04 with JetPack 4.6.1). It has two threads, one doing an inference via the jetson_inference library, and one doing face detection with DLIB (compiled for CUDA). The inference from the jetson_inference library takes about 190ms and the face detection work takes about 400ms. I would expect the total processing time (for both threads) to be roughly the maximum of the two threads, but instead what I’m seeing is that it’s closer to the sum of the two threads.
I added additional logging to see when each inference starts and ends and it looks like even though both inferences are starting their loops at the same time (the main thread feeds them their inputs and then they start processing while the main thread awaits their results) the actual inference calls are only executing one at a time.
Is there an issue/limitation on doing multiple inferences concurrently, or am I more likely just doing something dumb in my code that’s causing one thread to block the other? There are no explicit locks or anything that would cause blocking between threads (in fact the two threads have zero communication with each other).
Thanks in advance!