I was wondering if there is a function similar to __syncthreads() that ensures that all threads in a cooperative kernel “flush” their caches so that following reads are really from global memory.
In the programming guide nothing about memory is mentioned when using grid.sync()
Does __syncthreads() and a following grid.sync() already do this?
The reason for my question is that I have a program which runs in stages. During the stage execution a set of variables is only heavily read from (caching would be nice). At the end of each stage (after all threads have finished) these variables are updated and do not change until the end of the next stage.
If I could force that after each stage all threads in all active threadsblocks read the variables from global memory (at least once), I could avoid using volatile.