Cuda-task parallelism on a single GPU

Hello,everyone! Can CUDA support task parallelism now?Can we execute different kernels simultaneously on a GPU?
I found that the CUDA streams support concurrency. But I think the CUDA streams support concurrency by using the pipeline mechanism. And in a fine-grained situation, the kernels in different streams are still executed serially. So the CUDA stream can’t support task parallelism. Am I right?