Throttling concurrent streams in a GPU

How can I limit the number of current streams on a GPU? Say I’ve determined by trail-n-error that 4 streams can process currently on a GPU but 5 will cause an OOM error. But, CUDA doesn’t know that and keeps pushing streams onto the GPU until OOM. Is there a configuration parameter I can set? Do I have to do it in my app code?

Thanks, Roger

As far as I can tell you have to do it in the code. I limit the number of streams by performing a device sync every 4 streams on each device.