Texture Cache Coherency

In the CUDA programming guide, it says the following about the coherency of texture cache:
“within the same kernel call, the texture cache is not kept coherent with respect to global memory writes”.

Is there a work around for this? Using driver API or in OpenGL maybe?


I would assume that the limitation is because of the way the hardware works. The limitation doesn’t really matter, as I see it. If you have a grid running where some threads are reading values other threads have written (to need the cache coherency), then you automatically have a race condition problem.

No, a kernel cannot flush the texture cache by itself. What you can do is work with its limited size, i.e. fetch enough data between reads to clean out the offending cache line.

But unless you are on a completely fixed platform, that’s a really horrible idea. To be honest, it’s also a pretty horrible idea on a fixed platform. ;)

This limitation matters when data dependency exist in computations, when one thread relies on data computed from another thread. The race condition should presumably be resolved by the use of barriers I guess.

Which will probably cause a dead-lock :) mind you that you dont have control of the scheduling of the blocks

and at some point one block might be blocked by your barrier waiting for another block to release it but this

block is still not even active.

Its not like in CPU where everything is running at the same time (i.e. all the threads you opened are running

and can realy communicate with each other)

barriers will also probably make the whole thing too slow - something like atomic adds


Exactly. And the only barrier available to you is letting the kernel complete execution and starting a 2nd kernel launch. Reads from the texture cache are guaranteed to not be “stale” in that 2nd kernel launch, you will read everything written in the first.