read global memory conflict

Hi everyone,

If several threads from different blocks read the same global memory. Will it occur some errors?


PS: when I read the SDK, I can’t understand the following code in particles example.

in the file of around line 183, the highlight line in the following code.

[codebox]global void

reorderDataAndFindCellStartD(uint2* particleHash, // particle id sorted by hash

			             float4* oldPos,

						 float4* oldVel,

						 float4* sortedPos, 

						 float4* sortedVel,

						 uint*   cellStart)


int index = __mul24(blockIdx.x,blockDim.x) + threadIdx.x;

uint2 sortedData = particleHash[index];

// Load hash data into shared memory so that we can look 

// at neighboring particle's hash value without loading

// two hash values per thread

__shared__ uint sharedHash[257];

sharedHash[threadIdx.x+1] = sortedData.x;

if (index > 0 && threadIdx.x == 0)


	// first thread in block must load neighbor particle hash

	<b>volatile uint2 prevData = particleHash[index-1];</b>

	sharedHash[0] = prevData.x;



if (index == 0 || sortedData.x != sharedHash[threadIdx.x])


	cellStart[sortedData.x] = index;


// Now use the sorted index to reorder the pos and vel data

float4 pos = FETCH(oldPos, sortedData.y);       // macro does either global read or texture fetch

float4 vel = FETCH(oldVel, sortedData.y);       // see particles_kernel.cuh

sortedPos[index] = pos;

sortedVel[index] = vel;


why use the volatile qualifier? is it necessary?

thanks again.

You won’t get runtime errors if the same global memory location is loaded from or stored to by multiple threads “simultaneously”, either in the same or different blocks. All sorts of data incoherency and race conditions can occur if you attempt to have multiple threads storing and loading to the same locations without using some sort of serialization or atomic access mechanism.

Thank you first.

What’s the situation if the race condition occurs?

I knew that if multiple threads store data to the same location in global memory simultaneously, it will occur some errors. Here, I mean that the programme may get the unexpected result.

However, I think that it is ok to read data from the same address in the global memory by multiple threads. From what you said, can I understand like this way that it also can get the unexpected result if multiple threads read the same location in global memory.

The use of volatile here is an optimization to ensure memory coalescing. Since only the ‘x’ component of “prevData” is read in this function, the volatile ensures that the compiler will read the full uint2 value and get coalesced reads.

Thank you very much.