Hello,
I am currently trying to implement a striding dot product function. I am attempting to have this function update a gpu variable so that the resulting sum from the dot product is all stored in a single variable at the end of the striding.
Currently, the first thing I do is to reset the value at the gpu_variable to 0 using a separate Kernel, I then wait for the threads to sync before moving on the the dot product calculation.
I will include the code that works for when I run the block on a single thread as well as the code used when running it on multiple threads which does not work.
In both cases the kernel has the following inputs:
device void device_dotProduct(double* list1, double* list2, double* _value, double* _target, int arrayLength)
Code for single thread ( working )
int idx = threadIdx.x;
int stride = 1;
double target = 0.0;
for (int i = 0; i < arrayLength; i+=1){
target += list1[i]*list2[i];
}
*_target = target;
__syncthreads();
not working multithread code
int idx = threadIdx.x;
double target = 0.0;
for (int i = idx; i < arrayLength; i+=128){
target+=list1[i]*list2[i];
}
double sum = *_target + target;
double current_target = *_target;
__syncthreads();
*_target = current_target + target;
__syncthreads();