Howdy, Stranger!
It looks like you're new here. If you want to get involved, click one of these buttons!
Categories
- All Discussions1,524
- General534
- Graphics109
- GPU Computing419
- Mobile141
- Pro Graphics163
- Tools158
Tags in this Discussion
HLSL synchronization of UAV access using GroupMemoryBarrierWithGroupSync.
-
Hi all.
In a HLSL compute shader, we can synchronize read-before-write access of groupshared data as follows:
int neighbourValue = sharedData[neighbourIndex];
// Make sure all threads have read the data before overwriting it.
GroupMemoryBarrierWithGroupSync();
// Overwrite our item with the neighbour's value.
sharedData[GI] = neighbourValue;
The GroupMemoryBarrierWithGroupSync() call ensures that all shared data is read before any thread in the group overwrites a value. This is a very common technique and works perfectly well.
I have set up code which attempts to do the same thing with UAV data rather than group shared data as follows:
int neighbourValue = uavData[neighbourIndex];
// Make sure all threads in the group have read the data before overwriting it.
AllMemoryBarrierWithGroupSync();
// Overwrite our item with the neighbour's value.
uavData[localIndex] = neighbourValue;
When this code is compiled, the compiler fails to place the required sync_uglobal_g_t instruction into the assembly output and the code therefore has race conditions.
Does anyone know the reason for this?
I can put in workarounds using groupshared data accesses to force the sync_uglobal_g_t to be included (in which case everything works fine as expected), but it would be far preferable if this was not necessary. It seems to me as though this is a DirectCompute bug.
Note: The type of memory barrier used should not be relevant, since it is the group sync aspect which is important, and indeed it makes no difference which type of memory barrier I use.
Also note: There is no cross-access of UAV data between thread groups, in fact in my examples there is only a single thread group.