Switch tasks during infering

In the process of using TensorRT to develop, there was such a requirement: when a task A is executed, if a higher priority task B appears, then when A finished its current layer, it will be suspended, and the GPU Will be assigned to B. Is there an API for this kind of scheduling in TensorRT? Or do I need to use cuda? If it is the latter, do you have any detailed instructions on how to parse the TensorRT network?

Hi @382535941,

Hope following link may help you.

In case you need further assistance, we recommend you to post your concern here, to receive better help.

Thank you.