synchronizing on events


if i need to launch a large number of kernel blocks, but because of global memory utility/ constraints, can only launch a batch at a time, would it be better to induce an event and synchronize synchronize on it, rather than synchronizing on a stream or the device?

the thinking is that with an event, one can still issue kernels around the recorded event, such that kernel launching is monitored and controlled, whilst maintaining a work queue

launch x kernels, record an event, launch y kernels; upon the event, again launch x kernels…

that should be better than: launch x kernels; synchronize; launch x kernels; repeat until done