Hello All,
I am new to this forum but would like to say ‘hi’ and help with what is likely a simple problem but one that is giving me some issues. I am creating some code with CUDA that will execute using an array but can return wrong results - depending on the size of the problem. I think it could be a race condition issue but am unsure.
Can anyone look at the sample code and let me know if there is an obvious problem?
Thank you.
#define N 1024
#define TPB 512
__global__ void func(float4 *a, float *res, int n) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if(idx < n) {
for(int j = idx + 1; j < n; ++j) {
res[idx] += a[idx].w*(a[idx].x - a[j].x);
res[j] += a[j].w*(a[j].x - a[idx].x);
}
}
}
float4 *A; // put data in before calling GPU Kernel
...
float *result = new float[N];
...
func<<<N/TPB, TPB>>>(dev_A, dev_res, N);