concurrent kernel execution using stream

How to launch different kernels concurrently by using stream.Please guide me as I am new to cuda .
Thanks in advance

study:

  1. The Asynchronous concurrent execution section of the programming guide:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution

  1. The concurrent kernels cuda sample code:

http://docs.nvidia.com/cuda/cuda-samples/index.html#concurrent-kernels