What is double buffering?


Can anybody tell me what is double buffering in CUDA or in general and why do we use it?


Say you want to convert values in a BUFFER to another…(say Farenheit to Centigrade) …each thread works on an element… and would convert the same… No problem…

However, say, if each thread has to compare itself to its left and right successors and update the maximum or average or in gernaral a function of all three — then, you have race condition…

Thats when you use double buffering… ALl threads “read” from one buffer and update the other buffer…and so on…

Usually doub buff happens in loop… like BufferA is read, BufferB is written, Syncthreads, BufferB is read, BufferA is written, Syncthreads,… and so on

Thanks Sarnath for your quick reply!

Well, the following implements a double-buffered version of the sum scan

for d := 1 to log2n do

	forall k in parallel do

		  if k ≥ 2d then

			   x[out][k] := x[in][k − 2d-1] + x[in][k]


			   x[out][k] := x[in][k]


I am not able to find where is BUFFER ‘A’ and where is BUFFER ‘B’ in the above code.

The corresponding CUDA code is:

__global__ void scan(float *g_odata, float *g_idata, int n)


	extern __shared__ float temp[]; // allocated on invocation

	int thid = threadIdx.x;

	int pout = 0, pin = 1;

	// load input into shared memory.

	// This is exclusive scan, so shift right by one and set first elt to 0

	temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0;


	for (int offset = 1; offset < n; offset *= 2)


		pout = 1 - pout; // swap double buffer indices

		pin = 1 - pout;

		if (thid >= offset)

			 temp[pout*n+thid] += temp[pin*n+thid - offset];


			 temp[pout*n+thid] = temp[pin*n+thid];



	g_odata[thid] = temp[pout*n+thid1]; // write output


“in” and “out” are the 2 buffers… Note that the last statement in the first code box, comes under the OUTER Loop… Thats where the buffers are exchanged… Isnt it?

Although in and out are just indices, the way they are using in the code says that they operate at different dimensions of a multi-Dimensional array… Jus thnk about it

“in” and “out” are always the first dimension x, so I simply did not get how they are operating at different dimensions :(

Even if it in the same dimension, they are different buffers, isn’t it?

The code reads one buffer and updates another buffer… If there was no another buffer, there would be lot of race… isnt it?