Hide memory overhead with the kernel launches

Hello,
I was watching a video on yt about Nsight systems tool and in the video there’s an example on cuda, I was wondering how can I hide the memory overhead with the kernel launches as the attached image if I use multiple streams and multiple threads on the CPU like the video did.

Thanks in advance.

That version of Nsys (in the screenshot) is four years old. Can I ask what version you are seeing and what you would like to not see?