I am running inference using tensorRT. While the CPU is waiting for the tensorRT async inference, the CPU usage remains high. My goal is to free the CPU usage so that the CPU can be used by other thread.
My understanding is that I would need to use the flag SCHED_YIELD. However, I am not doing that properly since it doesn’t seems to have the desired effect.
My code:
import pycuda.driver as cuda
cuda.init()
cuda.Device(0).make_context(flags=cuda.ctx_flags.SCHED_YIELD)
If I change the flag to cuda.ctx_flags.SCHED_SPIN I get the same performance.
I also tried to initialize cuda like this:
cuda.init(flags=cuda.ctx_flags.SCHED_YIELD)
but it throws the error
pycuda._driver.LogicError: cuInit failed: invalid argument
How can I enable the desired setting? Is it SCHED_YIELD what I really need?
I am using this code to run the inference: tensorflow - ERROR: engine.cpp (370) - Cuda Error in ~ExecutionContext: 77 - Stack Overflow but I adapted it: I am using multiple threads (therefore multiple contexts) calling tensorRT inference from multiple threads