I am new to CUDA programming. I have an application that has two parts A
and B
. B
has data dependencies on A
. I want to see how well B
can perform without the caching effect (i.e. B
should not benefit from the data that was previously brought to the cache by A
). Is there any way of achieving this? I know that one possible approach is to write a kernel that pollutes the cache but this is not very elegant. Does CUDA or nvidia tool have any mechanism that allows me to do this?
As far as I know the state of the GPU L1 cache does not persist between kernel calls.
I am not sure about L2.