Problem with reduction code

Hi everyone.

I’m pretty new to CUDA, but have quite a good amount of programming other languages. I am trying to make a program to use the midpoint rule to calculate the integral of an equation, which works pretty nice when when I just do the calculations on the GPU then reduce on the CPU… Its then that my problems start.

My code attempts to reduce all of the threads to one sum total for the block (before the next step was to try and get the blocks to sum). Im finding that the code works for one block, but fails after that and I cannot figure out why so I am looking for someone more experienced who may be able to help me in the right direction…

Many thanks for your assistance

I haven’t looked at all of the code, but the last line of your kernel should just read

result[blockIdx.x] = threadsResults[0];

Otherwise you’d write past the end of the allocated memory.