Conversion of Histogram sample to do multiple Histograms

Dear All

I want to convert the Histogram Sample to do multiple Histograms at once,

I done the proper allocations

I know that shared memory is allocated in a block basis. And there are allocated blocks to a SMX as the shared memory available.

I done the bottom changes on the code.

I ran and did not gave runtime errors but the histograms are not right. But once gave good results. Seems a synchronization problem.

Also, I do not understand the need of the tag.
Cuda 9, Visual Studio 2013, GeForce 740M

Can someone help me, please?


Luis Gonçalves

  • uint *h_HistogramGPU = (uint *)malloc(256 * sizeof(uint)*numberhist);
    cudaMalloc((void **)&d_PartialHistograms, 240 * 256 * sizeof(uint)*numberhist);
    cudaMalloc((void **)&d_Histogram, 256 * sizeof(uint)*numberhist);</li>

  • __global__ void histogram256Kernel(uint *d_PartialHistograms, uint *d_Data, uint dataCount) { // Handle to thread block group d_PartialHistograms += blockIdx.y * 240 * 256; d_Data += blockIdx.y * (dataCount/sizeof(uint));

  • __global__ void mergeHistogram256Kernel( uint *d_Histogram, uint *d_PartialHistograms, uint histogramCount ) { d_PartialHistograms += blockIdx.y * 240 * 256; d_Histogram += 256 * blockIdx.y;
  • ***********************************************************+

  • assert(byteCount % sizeof(uint) == 0); dim3 grid1(PARTIAL_HISTOGRAM256_COUNT, numberhist, 1); histogram256Kernel<<>>( d_PartialHistograms, (uint *)d_Data, byteCount ); getLastCudaError("histogram256Kernel() execution failed\n"); dim3 grid2(HISTOGRAM256_BIN_COUNT, numberhist, 1); mergeHistogram256Kernel<<>>( d_Histogram, d_PartialHistograms, PARTIAL_HISTOGRAM256_COUNT ); getLastCudaError("mergeHistogram256Kernel() execution failed\n");