question about the sample of histogram64

hakuna · October 11, 2007, 8:43am

when I read the sample of histogram64, several questions puzzled me. In the kernel program, the threadPos is computed by three bit operation:

//Encode thread index in order to avoid bank conflicts in s_Hist[] access:

//each half-warp accesses consecutive shared memory banks

//and the same bytes within the banks

const int threadPos = 

//[31 : 6] <== [31 : 6]

((threadIdx.x & (~63)) >> 0) |

//[5  : 2] <== [3  : 0]

((threadIdx.x &    15) << 2) |

//[1  : 0] <== [5  : 4]

((threadIdx.x &    48) >> 4);

why do it like that? is there any reason?

the another question is that:

in the main program code, the h_Data array is generated in the range of [0,255], why in the computing, it said: only 64-bit histogram of 8-bit input data array is calculated, only highest 6 bits of each 8-bit data element are extracted?

//Cycle through current block, update per-thread histograms

    //Since only 64-bit histogram of 8-bit input data array is calculated,

    //only highest 6 bits of each 8-bit data element are extracted,

    //leaving out 2 lower bits.

    for(int pos = threadIdx.x; pos < dataSize; pos += blockDim.x){

        unsigned int data4 = d_Data[baseIndex + pos];

        addPixel64(s_Hist, threadPos, (data4 >>  2) & 0x3FU);

        addPixel64(s_Hist, threadPos, (data4 >> 10) & 0x3FU);

        addPixel64(s_Hist, threadPos, (data4 >> 18) & 0x3FU);

        addPixel64(s_Hist, threadPos, (data4 >> 26) & 0x3FU);

I am now been confused by the code, expecting any one can give me some detail explanation. I have read the histogram.pdf, but can’t find the answer.

any reply is grateful.

wumpus · October 11, 2007, 11:41am

The first serves to work around shared memory bank conflicts inside the code; if you remove it, the code still works the same, but will be (a bit) slower.

hakuna · October 12, 2007, 2:27am

Then, why is it that the gray level is 0-255, but the bins are just 64 and the value of h_Data is only highest 6 bits are extracted? we know that the shared memory access will generate a 4 way conflict, but how do we know how to shift all the bits?

Topic		Replies	Views
Shared Memory Problems ... Conflict free access CUDA Programming and Performance	22	3715	August 24, 2010
the memory using about histogram CUDA Programming and Performance	0	1796	November 1, 2007
simple questions about block memory from 64-bin Histogram sdk example CUDA Programming and Performance	1	4062	August 5, 2008
Having a little trouble with mutex/synchronisation CUDA Programming and Performance	29	57775	June 5, 2007
Fast 256-bin histogram CUDA Programming and Performance	6	2497	May 9, 2016
Basic question on array in shared memory CUDA Programming and Performance	12	8247	December 7, 2009
question about the shared memory CUDA Programming and Performance	4	3950	October 30, 2007
Warp serialization problem: help me CUDA Programming and Performance	20	13569	December 29, 2009
simple questions about block memory from 64-bin Histogram sdk example CUDA Programming and Performance	1	5099	August 4, 2008
Requesting clarification for Shared Memory Bank Conflicts and Shared memory access? CUDA Programming and Performance hw , cuda	11	5214	January 23, 2024

question about the sample of histogram64

Related topics