How to avoid L1 cache trashing?

Is there a way to launch a kernel and choose a specific SM to execute it?

In my program, I have a few kernels that each one of them needs to be launched many times and I want to avoid L1 cache trashing whenever it’s possible.

(I want to launch kernels that use the same data on the same SM)


There is not any CUDA provided method to do so. If you really want to get creative, it may be possible to do so using the SMID register. You can find various examples discussing its usage on various forums. Here is one.