Hello All,
I wonder anyone can hep me out about this.
I want to implement a parent kernel to calculate one metric from one time series (around 2000 time points) and a child kernel to replace the for loop (0-1998, each iteration has another sub-loop (from i+1 to 2000)). Each iteration will increase the number of appearance if curtain condition meets. My pseudo code is
device numcount(float *dat, int loc, int rows, int len, int *count)
{
if(threadIdx.x>rows-1) return;
int start=threadIdx.x;
// take the threadIdx.x -th row from the
int j;
for(j=start+1; j<len; j++)
{
if…
(*count)++;
}
} global kernel(float *data, int loc, int rows, int len){
My question is how to avoid conflicts of increasing the same global variable count? In openmp, this can be done using reduction, but how to implement a similar process in CUDA?
Here is my simplified child kernel. fsum is a float array allocated using cudaMalloc, vm is an integer array allocated with cudaMalloc too.
__global__ void vecmatch_row(float *fsum, int *vm)
{
int idx= threadIdx.x;
if ...
{
atomicAdd( &(fsum + idx) , 1.5); // just for testing
atomicAdd( &(vm+idx), 1);
}
} // end of the j loop
}
the code can run but nothing was added to the memory space. If I changed
atomicAdd( &(fsum + idx) , 1.5);
to be
atomicAdd( (fsum + idx) , 1.5);
I got an cuda error: invalid configuration argument.
Any further advice?
Thanks
I still couldn’t get the number updated. fsum is defined inside the main function and allocated using cudaMalloc, do I need to define it as a global value outside of the main function?
Could I do it like this way?
No, that won’t work. cudaMalloc cannot allocate a pointer that is located in device memory
I think you probably have other errors in your code, and you are fixated on the idea that this is not working, when in fact something else in your code is not working.
I encourage you to use proper CUDA error checking, run your code with cuda-memcheck, and use a methodical debug approach that tackles the problems the tools are telling you, rather than jumping to the conclusion that somehow the atomic update is not working.