the introduction of stream in CUDA help~

Who can help me to introduce the stream management and the asynchronism information in CUDA?
Thanks a lot! :rolleyes:

I think there is simple sample code in the CUDA programming guide. However, through my experiment, it seems the stream management only can overlap the memory copy and kernel function, it cannot improve the overall performance if two streams are both kernel functions. I guess the kernel functions are still executed sequentially.

That is correct, only one kernel executes at a time. Streams are intended to overlap computation and memory transfers only.