Conditional D2H transfer in stream

I do online processing at high rates (>1000Hz) using 3 stages pipeline H2D / KERNEL / D2H for data rejection using 3 overlapping streams.

the last step (copy back) of the pipeline should be done conditionally, upon evaluation of a veto in KERNEL.

stream operation are queued in advance for efficiency reason: for stage 1 & 2 it is not an issue as they are always executed

the problem is the third stage. is there a proven solution ?
I manage to do this by spliting the latter stream in two and using stream mem ops as lock mechanism triggered by veto evaluation but that solutions seems not scaling well if I work on a stack of data (to increase available processing time). It would require more streams and there are in limited number (16 IIRC)

any help ?

BTW, a solution interlacing K + D2H is unfortunately not an option as memcopy in kernel is only available for D2D movement.