histogram w/512 bins using integers as input data

mmaurier · June 26, 2009, 6:21pm

Can the the sample code histogram256 be changed to 512 bins, using integers instead of char type as input data?
Could somebody show a working code sample?

Thanks,

Marco

jph4599 · June 26, 2009, 8:59pm

Duplicate Post: [url=“http://forums.nvidia.com/index.php?showtopic=100207”]http://forums.nvidia.com/index.php?showtopic=100207[/url]

I’d imagine it is possible, have you tried reading the histogram.pdf document included in the doc directory of the histogram256 sample? That is probably the first place I would check for hints about how to do this…

cvnguyen · June 27, 2009, 5:43am

I would think about 2 methods:

Reduce the block size.
Call histogram256 twice, one time for counting 0-255 and the other for counting 256-511.

mmaurier · July 1, 2009, 5:14am

histogram using SM11 only

on histogram256.cpp

change
unsigned char *h_Data;
unsigned int *h_ResultCPU, *h_ResultGPU;
unsigned char *d_Data;
to
unsigned int *h_Data;
unsigned int *h_ResultCPU, *h_ResultGPU;
unsigned int *d_Data;

change
histogram256CPU(
h_ResultCPU,
(unsigned int *)h_Data,
DATA_N / 4
);
to
histogram256CPU(
h_ResultCPU,
(unsigned int *)h_Data,
DATA_N
);

on histogram256_common.h

change
#define BIN_COUNT 256
to
#define BIN_COUNT 512

on histogram256_gold.cpp

change to

for (i = 0; i < dataN; i++){
    h_Result[ h_Data[i] ]++;
	//data4 = h_Data[i];
    //h_Result[(data4 >>  0) & 0xFF]++;
    //h_Result[(data4 >>  8) & 0xFF]++;
	//h_Result[(data4 >> 16) & 0xFF]++;
    //h_Result[(data4 >> 24) & 0xFF]++;	
}

on histogram256_kernel.cuh

change to
for(int pos = globalTid; pos < dataN; pos += numThreads){
unsigned int data4 = d_Data[pos];
addData256(s_Hist + warpBase, data4, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 0) & 0xFFU, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 32) & 0xFFU, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 64) & 0xFFU, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 96) & 0xFFU, threadTag);
}

also change
histogram256Kernel<<<BLOCK_N, THREAD_N>>>(d_Result256, d_Data, dataN / 4);
to
histogram256Kernel<<<BLOCK_N, THREAD_N>>>(d_Result256, d_Data, dataN );

the result speed-up should be kernel histogram 30 times faster than host histogram