Let’s assume the flag variable is stored in device memory.
You could issue a kernel into a stream that waits until that variable is 2:
__global__ void k(volatile int *d) { while (*d != 2){};}
Then record the event after that kernel:
k<<<1,1, 0 , stream>>>(d);
cudaEventRecord(...);
In another stream, wait for that event.