Hello,
I was watching a video on yt about Nsight systems tool and in the video there’s an example on cuda, I was wondering how can I hide the memory overhead with the kernel launches as the attached image if I use multiple streams and multiple threads on the CPU like the video did.
Thanks in advance.