Incorrect values read by cudaEvent when using in streams

I wrote a code in which there are two streams and in each stream there are several kernel launches. There are also two arrays of cudaEvents one for start events (eventStartArray) and the other one for end events (eventStopArray). Each eventStartArray element is started before each kernel launch and also each eventStopArray element is ended after each kernel launch. All these operation are done in a loop. For example, the loop iterates 8 times and there are two streams. Therefore, there are 16 elements in each array. To better separate the events for each stream, the two arrays are organized in a 2-D format with two rows each corresponding to a stream.
A cudaEventSynchronize is also called for each element after the loop in a separate loop.
The problem is that except for the first element of each stream, for the rest of the elements the start time is incorrectly recorded. The stop event of kernel launch i is considered as the start event kernel launch i+1. However, this happens only for the kernel with shorter execution time.

for (int i = 0;i < iter;i++)
	cudaEventRecord(eventStartArray[0][i], streams[0]);
	kernel0<<<..., ..., 0, streams[0]>>>(...);
	cudaEventRecord(eventStopArray[0][i], streams[0]);

	cudaEventRecord(eventStartArray[1][i], streams[1]);
	kernel1<<<..., ..., 0, streams[1]>>>(...);
	cudaEventRecord(eventStopArray[1][i], streams[1]);
for (int i = 0;i < iter;i++)
	for (int k = 0;k < 2;k++)