I want cudaEventSync() returns once the target event finishes, but it seems returns when more events finished, even after the event that last called cudaEventRecord(). Is this expected?
event_t copyHost2Dev(){
cudaStreamcreate(&stream) // stream is different in each loop.
cudaEventCreate(&eid)
cudaMemcpyAsync(stream);
cudaEventRecord(eid, stream)
return eid;
}
void wait(id){
cudaEventSynchronize(id)
}
eid =
for(int i =0; i < 5; ++I){
eid.append(copyHost2Dev())
}
wait(eid[0]) ------------------ This line returns only when all events are synced. what I expect is it returns once eid[0] finished.
I looked into with nsight system. and find cudaEventSynch(eid) synch all events, or the last call of cudaEventRecord() in most of times, just regardless of the target event id I cared. Is this expected?
I know all Host2Dev mem copy is serialized. is this correct?