How to analyse the various events which are taking place at GPU using CUDA

I am processing an image at GPU using CUDA streams. I have divided my image into three smaller segments and then I transfer these smaller segments using three different CUDA streams. These 3 streams call the kernel for image processing and these 3 streams copies back the processed data back to the CPU.

I want to see, if this streaming scheme is really helping me. I want to see the time graphs of each event. I think that Nsight can help me in that. I am really not sure which tool should be used as I am new to CUDA. Please tell me about it.

I need something like below:

although it definitely can be, stream-based versions/ variants may not be faster than non-stream-based versions/ variants
but in most cases, stream-based versions/ variants are more economical than non-stream-based versions/ variants - device memory footprint would be one measure

hence, in evaluating whether streams are helping your application, you need to
a) determine whether the application finished quicker, or at least in the same amount of time, when using streams, compared to when not using stream;
b) determine whether the application is more economical as a result, and whether this is of value to you

so, i am not certain whether it is necessary to note timing related to events
you can simply time the whole application, and you can simply note whether the streams function asynchronously, as designed and intended
the profiler can report on both, and on a stream-basis

@little_jimmy : thanks, actually I wanted to know about the tool which can be used to generate the timeline. I came to know that I can use NVIDIA Visual Profiler for that purpose. But it is not working in my case. You can have a look at my question at SO.

[url]c++ - NVIDIA Visual profiler does not generate a timeline - Stack Overflow

are you certain that your application is race-free; also stream race-free?

run both memcheck and racecheck on the application, would be my preliminary advice
and verify that you have not tied up streams and stream-related variables in races
although stream races are possible, i am not aware of a tool that can track stream races