How to hang up a stream waiting for a CPU thread?

I’ve known that callbacks could be inserted in streams to make streams interact with CPU thread. But all callbacks are held by the same driver thread, and if a callback of a stream waiting for CPU thread, the driver thread will be hung up, and other callbacks of other streams will not run, making other streams be hung up.
Is there any way to hang up a stream automatically after the previous task over, and waiting for a CPU thread, then start the left task had been submitted.
Thanks!

callbacks may indeed be more proactive than mere stream event logs, but you are seemingly also aware of their shortcoming
with multiple streams, event logs - recording events - carry precedence, in my mind
and in a sense, callbacks may be redundant - do you truly need to know and act the instant a stream task is complete, particularly in the context of multiple tasks in multiple streams?

you would then move from a) order and wait/ check to b) order all first, then wait/ check
you can also issue more sophisticated (synchronization) orders via cudaStreamWaitEvent - have streams synchronize and align with each other, with the host only having to take a parental role

provide an example if you require a more detailed answer