Two CUDA Fortran accelerated exes with different results

Hello Mat!

I write a multi-block single-thread version of an optimization algorithm to optimize the parameters of a hydrological model. It can obtain the right answer.

However, when I revise it into a multi-block multi-thread version, it can run, but with wrong answer.

In the multi-thread version, for each evolution step, the best solution (bestf) found by the algorithm seems can not be improved sufficiently. In the single-thread version, the evolution process is OK.

Because in the multi-thread version, there needs many syncthreads() calls and usage of the shared memory, I’m not sure whether I use the synchronization and the shared memory properly.

I have send the two versions to the service E-mail. Please help me to check why.

Thanks in advance!