cudaEventSync with cudaMemcpyAsync_Host2Device

I want cudaEventSync() returns once the target event finishes, but it seems returns when more events finished, even after the event that last called cudaEventRecord(). Is this expected?

event_t copyHost2Dev(){
cudaStreamcreate(&stream) // stream is different in each loop.
cudaEventRecord(eid, stream)
return eid;
void wait(id){

eid =
for(int i =0; i < 5; ++I){
wait(eid[0]) ------------------ This line returns only when all events are synced. what I expect is it returns once eid[0] finished.

I looked into with nsight system. and find cudaEventSynch(eid) synch all events, or the last call of cudaEventRecord() in most of times, just regardless of the target event id I cared. Is this expected?

I know all Host2Dev mem copy is serialized. is this correct?

If you use cudaEventRecord the same way as in your example, it will not work the way you intended.
cudaError_t cudaEventRecord ( cudaEvent_t event, cudaStream_t stream = 0 )

You do not pass the stream to cudaEventRecord the event is recorded on the default stream.

my typo. I actually called with cudaEventRecord(eid, stream). stream here is used in cudaMemcpyAsync(). And I cannot get what I want yet.

If that’s the case, you should provide a complete example which can be compiled and run.

cudaEvencSync works well. I find it’s caused by incorrect use.
Thanks for your help.