In the process of using TensorRT to develop, there was such a requirement: when a task A is executed, if a higher priority task B appears, then when A finished its current layer, it will be suspended, and the GPU Will be assigned to B. Is there an API for this kind of scheduling in TensorRT? Or do I need to use cuda? If it is the latter, do you have any detailed instructions on how to parse the TensorRT network?
Hope following link may help you.
In case you need further assistance, we recommend you to post your concern here, to receive better help.