How to force the GPU to drop its cache?

I am new to CUDA programming. I have an application that has two parts A and B. B has data dependencies on A. I want to see how well B can perform without the caching effect (i.e. B should not benefit from the data that was previously brought to the cache by A). Is there any way of achieving this? I know that one possible approach is to write a kernel that pollutes the cache but this is not very elegant. Does CUDA or nvidia tool have any mechanism that allows me to do this?

As far as I know the state of the GPU L1 cache does not persist between kernel calls.

I am not sure about L2.