We are trying to execute several TensorRT programs simultaneously, each TensorRT program is called and executed by std::thread and using separate streams. But after analysis on NVVP, we discover that the TensorRT processes didn’t execute at the same time.
The first image shows there is a single TensorRT program running, and the TensorRT calls are executed one by one. But in the second image, there are two TensorRT program running, and TensorRT calls in both programs are not as compact as running single program.
Are there other methods that allow us to execute several TensorRT programs fully, and not cause each program affect each other ?