Switch tasks during infering

In the process of using TensorRT to develop, there was such a requirement: when a task A is executed, if a higher priority task B appears, then when A finished its current layer, it will be suspended, and the GPU Will be assigned to B. Is there an API for this kind of scheduling in TensorRT? Or do I need to use cuda? If it is the latter, do you have any detailed instructions on how to parse the TensorRT network?