Hello all!
So my previous question about the GPU compiler was a success! Thanks again for the help on that. Unfortunately, I’m still getting errors in the precision (ie, I should wind up with a error of 16% but end up with 25%, so just a plain mismatching of results).
I think the issue has to do with a variable not being updated properly, or trampled. Now, when I run on Multicore OpenMP I have no issues, but switching over to GPU causes the problems. It may just be that is the nature of the beast, but I was wondering if it had to do with how the code I have is written. Unfortunately, I can’t post that code… at least not the main loops of it. I think I have the reductions right since it was behaving on Multicore.
What I was wondering was if it is possible to have a external CPU multicore loop with a internal GPU loop? Since I can’t post up too much, I’m basically just looking for whether it is possible to run the Multicore code that is working as it is, then just have a dot product call in the middle of it shunt onto the GPU and what that command would look like? If possible, I should be able to work from there. I just wanted to ask before spending a week figuring out it might not work haha!
Thanks again! I do love the nvfortran compiler, the multicore results are indeed better than the ifort ones! Good job!