I have a cooperative group launch, and somewhere in the launch i do
Write to global memory from threads
Grid sync
Read value from global memory from different threads
Do something with value
Printf value
The printed value is wrong, if I instead do:
Write to global memory from threads
Grid sync
Read value from global memory from different threads
Printf value
Do something with value
Printf value
The printed value is correct.
I don’t understand how this can be the case as the grid sync is between the write and the read. Is there something I might be missing? Does a grid sync only guarantee sync of threads and not global memory?
Also, I get the grid value with “cg::grid_group grid = cg::this_grid();” and sync with “cg::sync(grid);”
Oh, don’t know why I didn’t consider that, thanks!
So I had something like:
data[id] = value;
cg::sync(grid);
value = data[id]
I made a pointer to data like “volatile float* vdata = data;” and peppered in __threadfence()'s and i’m still getting versions of the same issue. Is there something extra I might need?
As far as I know, it is not documented anywhere that a grid sync provides a memory barrier. However my previous statement that it did not may be incorrect (and the reductionMultiBlockCG code would seem to suggest that).
So I edited my previous statement.
I’m not able to speculate why your code is not working the way you would like.
If you provide a complete, testable code, perhaps someone will be able to help you. Note that this doesn’t have to be and probably shouldn’t be your whole code. Instead, create a standalone complete, compilable example, that demonstrates the issue, but has extraneous items removed.