Problem of Cumulative Histogram

I am trying to implement Cumulative Histogram using CUDA-CPP and as a result i am getting a array of 256 elements all zeros.

__global__ void cum_hist_kernel(int *cum_hist2, int *hist){
    int idx = threadIdx.x + blockIdx.x*blockDim.x;
    // peform cummulative sum
    if (idx < 256){
        cum_hist2[idx] = 0;
        for (int i = 0; i <= idx; i++){
            cum_hist2[idx] += hist[i];
        }
    }
}
int THREADS = 128;
int BLOCKS = (256+THREADS-1)/THREADS;
int *cum_hist2;
cudaMallocManaged(&cum_hist2, 256*sizeof(int));
cum_hist_kernel<<<BLOCKS, THREADS>>>(cum_hist2, hist_hist);
cudaDeviceSynchronize();

It’s possible of course that there is an error in some part of the code you haven’t shown. So providing a minimal complete example may help others to help you. I usually also suggest its good practice to use proper CUDA error checking.

I suggest running your code with compute-sanitizer. Are any errors reported?

And as an aside, this isn’t a very efficient computation method (no better than a serial approach) but that doesn’t mean that it should produce all zeros, for non-zero input.

When I build a complete code out of what you have shown, I don’t see any errors and the output is not all zeros:

# cat t184.cu
#include <iostream>

__global__ void cum_hist_kernel(int *cum_hist2, int *hist){
    int idx = threadIdx.x + blockIdx.x*blockDim.x;
    // peform cummulative sum
    if (idx < 256){
        cum_hist2[idx] = 0;
        for (int i = 0; i <= idx; i++){
            cum_hist2[idx] += hist[i];
        }
    }
}

int main(){
  int *hist_hist;
  cudaMallocManaged(&hist_hist, 256*sizeof(int));
  for (int i = 0; i < 256; i++) hist_hist[i] = 1;

  int THREADS = 128;
  int BLOCKS = (256+THREADS-1)/THREADS;
  int *cum_hist2;
  cudaMallocManaged(&cum_hist2, 256*sizeof(int));
  cum_hist_kernel<<<BLOCKS, THREADS>>>(cum_hist2, hist_hist);
  cudaDeviceSynchronize();

  for (int i = 0; i < 10; i++) std::cout << cum_hist2[i] << " ";
  std::cout << std::endl;
}
# nvcc -o t184 t184.cu
# compute-sanitizer ./t184
========= COMPUTE-SANITIZER
1 2 3 4 5 6 7 8 9 10
========= ERROR SUMMARY: 0 errors
#
1 Like

The input variable (hist_hist) that I was using to call the kernel was not in gpu memory.

So that’s the mistake I did. Thank you for providing help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.