Limit GPU usage per process

I’m running a process using 2 threads, generic one and object detection using YOLO, the inference time in YOLO-V3 on TX2 using DarkNet API is about 500ms which in this time the GPU running at 100%, this causing all the other process and threads running on the CPU to stuck. Why there is a connection between those 2?
Do I have a way (not via the DarkNet API) using nvidia tools to limit the GPU usage per process or per thread?


Maybe you can try to attach the inference job to a special CUDA stream.

Tasks in the same CUDA stream will be executed sequentially.
But the tasks cross different stream will be executed in the parallel.