luisgo
#1
Dear All
I want to convert the Histogram Sample to do multiple Histograms at once,
I done the proper allocations
I know that shared memory is allocated in a block basis. And there are allocated blocks to a SMX as the shared memory available.
I done the bottom changes on the code.
I ran and did not gave runtime errors but the histograms are not right. But once gave good results. Seems a synchronization problem.
Also, I do not understand the need of the tag.
Cuda 9, Visual Studio 2013, GeForce 740M
Can someone help me, please?
Thanks
Luis Gonçalves
uint *h_HistogramGPU = (uint *)malloc(256 * sizeof(uint)*numberhist);
cudaMalloc((void **)&d_PartialHistograms, 240 * 256 * sizeof(uint)*numberhist);
cudaMalloc((void **)&d_Histogram, 256 * sizeof(uint)*numberhist);</li>
__global__ void histogram256Kernel(uint *d_PartialHistograms, uint *d_Data, uint dataCount)
{
// Handle to thread block group
d_PartialHistograms += blockIdx.y * 240 * 256;
d_Data += blockIdx.y * (dataCount/sizeof(uint));
__global__ void mergeHistogram256Kernel(
uint *d_Histogram,
uint *d_PartialHistograms,
uint histogramCount
)
{
d_PartialHistograms += blockIdx.y * 240 * 256;
d_Histogram += 256 * blockIdx.y;
***********************************************************+
assert(byteCount % sizeof(uint) == 0);
dim3 grid1(PARTIAL_HISTOGRAM256_COUNT, numberhist, 1);
histogram256Kernel<<>>(
d_PartialHistograms,
(uint *)d_Data,
byteCount
);
getLastCudaError("histogram256Kernel() execution failed\n");
dim3 grid2(HISTOGRAM256_BIN_COUNT, numberhist, 1);
mergeHistogram256Kernel<<>>(
d_Histogram,
d_PartialHistograms,
PARTIAL_HISTOGRAM256_COUNT
);
getLastCudaError("mergeHistogram256Kernel() execution failed\n");