I have a question about reading and writing memory in an unstructured mannor in a accelerator region:
I noticed that if I read the memory of an array (A3_GPU) in a loop unordered, than the loop is still parallelizable and the performance is not that bad:
Q1 will have unordered values, eg:
i=1 : Q1=2345
i=2 : Q1=12
i=3 : Q1=18474
and so on…
!reading unstructured !$acc region do i = 1,100000 Q1 = A1_GPU(KP,1) A2_GPU(i,3) = A3_GPU(Q1,1) end do !$acc end region
But when I want to write an array (B2_GPU) with an unstructured pattern, than the compiler forces the loop to execute sequentially on the device (!$acc do sec), which gives me very bad performance.
The loop looks like the following, and K1 is unordered, eg:
i=1 : K1=2345
i=2 : K1=12
i=3 : K1=18474
!writing unstructured !$acc region do i = 1,100000 K1 = B1_GPU(KP,1) B2_GPU(K1,3) = B3_GPU(i,1) end do !$acc end region
Is there any workaround? Or just a possibility to tune such a loop?
What does the “width mean” if I use the directive: !$acc do sec [(width)]?
Copying the data to the host and executing the loop on the CPU and copying it back to the device is not an option, this would take more tme I guess.
Thank you very much![/quote]