Any approach I can think of is going to require some level of special handling, probably.
Follow an exact recipe given in an online forum or blog or the programming guide. This doesn’t verify your case, but at least you can verify the basic mechanism.
Restrict your case for test purposes to only access data that is in the carveout region. Look at profiler metrics for L2 hit. Note that the profiler (nsight compute) has cache control behavior that you may have to manually adjust, but I think this should not be necessary for the basic test I have suggested here, assuming there is some data reuse in the carveout region, by the kernel of interest.