Can the the sample code histogram256 be changed to 512 bins, using integers instead of char type as input data?
Could somebody show a working code sample?
Thanks,
Marco
Can the the sample code histogram256 be changed to 512 bins, using integers instead of char type as input data?
Could somebody show a working code sample?
Thanks,
Marco
Duplicate Post: [url=“http://forums.nvidia.com/index.php?showtopic=100207”]http://forums.nvidia.com/index.php?showtopic=100207[/url]
I’d imagine it is possible, have you tried reading the histogram.pdf document included in the doc directory of the histogram256 sample? That is probably the first place I would check for hints about how to do this…
I would think about 2 methods:
Reduce the block size.
Call histogram256 twice, one time for counting 0-255 and the other for counting 256-511.
histogram using SM11 only
on histogram256.cpp
change
unsigned char *h_Data;
unsigned int *h_ResultCPU, *h_ResultGPU;
unsigned char *d_Data;
to
unsigned int *h_Data;
unsigned int *h_ResultCPU, *h_ResultGPU;
unsigned int *d_Data;
change
histogram256CPU(
h_ResultCPU,
(unsigned int *)h_Data,
DATA_N / 4
);
to
histogram256CPU(
h_ResultCPU,
(unsigned int *)h_Data,
DATA_N
);
on histogram256_common.h
change
#define BIN_COUNT 256
to
#define BIN_COUNT 512
on histogram256_gold.cpp
change to
for (i = 0; i < dataN; i++){
h_Result[ h_Data[i] ]++;
//data4 = h_Data[i];
//h_Result[(data4 >> 0) & 0xFF]++;
//h_Result[(data4 >> 8) & 0xFF]++;
//h_Result[(data4 >> 16) & 0xFF]++;
//h_Result[(data4 >> 24) & 0xFF]++;
}
on histogram256_kernel.cuh
change to
for(int pos = globalTid; pos < dataN; pos += numThreads){
unsigned int data4 = d_Data[pos];
addData256(s_Hist + warpBase, data4, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 0) & 0xFFU, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 32) & 0xFFU, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 64) & 0xFFU, threadTag);
//addData256(s_Hist + warpBase, (data4 >> 96) & 0xFFU, threadTag);
}
also change
histogram256Kernel<<<BLOCK_N, THREAD_N>>>(d_Result256, d_Data, dataN / 4);
to
histogram256Kernel<<<BLOCK_N, THREAD_N>>>(d_Result256, d_Data, dataN );
the result speed-up should be kernel histogram 30 times faster than host histogram