Unexpected result with CudaEventRecord on stream CudaEventRecord on a stream waits for completion o

I am interested in capturing the events happening in a single CUDA stream out of many streams. As per the document i could do this
by using cudaEventRecord with second argument as the stream itself. However the result it gives looks like it doesnt care about the second argument for stream. The following code is expected to get the elapsed time fpr stream 0 which should be almost 0 as there is no operations carried out in stream[0] between the start and stop events. But I get elapsed time about 300 msec which is actually the computation time of stream[1]. Is there any property i need to set that i’m missing out.

cudaEventRecord(start, streams[0]);
// asynchronously launch nstreams kernels, each operating on its own portion of data
//init_array<<<blocks, threads, 0, streams[0]>>>(d_a + 0 * n / nstreams, d_c, niterations);
init_array<<<blocks, threads, 0, streams[1]>>>(d_a + 1 * n / nstreams, d_c, niterations);
cudaEventRecord(stop, streams[0]);

cudaEventSynchronize(stop);
CUDA_SAFE_CALL( cudaEventElapsedTime(&elapsed_time, start, stop) );
printf(“elapsed time:%.2f\tstream:%d\n”,elapsed_time, stream[o]);

It looks like a bug to me , Is there any way i can contact the CUDA developers who own the event API component?

Thanks in advance!
smeitei