Thread data concatenation

Would it be possible to provide guidance on thread data concatenation with OpenACC?

I am using OpenACC to accelerate an N-body problem. I am currently trying to solve a problem in which my computational domain is subdivided into a large number of sub-volumes. I have a number of points within the computational domain and am essentially developing a list of which points are in each subvolume. I am looping through all the points and, for each point, determining which subvolume the point is in. I have a three dimensional array which I have called NPIZPL which stores the number of points in each subvolume and an array ZPLIST which lists the point indices.

The issue is that each of the threads have their own instances of NPIZPL and ZPLIST. I would like to find a way of joining these together - concatenating them. There is no requirement for the points to be in any order.

I have simplified my code and pasted it below. This is only looking at one subvolume (index 3,3,3) and 10 points, each of which are assume to lie in subvolume 3,3,3.

On CPU the output is NPIZPL = 10 and NPIZPL = 1,2,3,4,5,6,7,8,9,10

On GPU the output is NPIZPL = 1 and NPIZPL = single integer between 1 and 10.

I would be grateful for any advice you could provide.

Tim.

! --------------------------------------------------------------------
!
!      this sub creates zone particle list
!
! --------------------------------------------------------------------  
      subroutine gpu_zone_data_test()         
      
      use memory_allocation  

! --------------------------------------------------------------------
! start off by zero-ing arrays

      !$acc kernels          
       NPIZPL(:,:,:) = 0.00
       ZPLIST(:,:,:,:) = 0.00 
       
      !$acc end kernels

! --------------------------------------------------------------------
      
      !$acc parallel loop 
      do p = 1,10   
          
        NPIZPL(3,3,3) = NPIZPL(3,3,3) + 1
        
        ZPLIST(3,3,3,NPIZPL(3,3,3)) = p 
      
      enddo ! p   
            
      !$acc wait         
          
      !$acc update host(NPIZPL,ZPLIST)
      
      print*,'---------333--------------'
      print*,' '
      print*,NPIZPL(3,3,3)
      print*,' '
      print*,ZPLIST(3,3,3,1:NPIZPL(3,3,3))
      print*,' '         
!   -----------------------------------------------------------------
 
      return
      end

Hi Tim,

What’s happening is that all the threads are reading “NPIZPL(3,3,3)” at the same time, so they’re all getting “0.0”. They then all store the value “1.0” back into the array.

What you need are atomic operations so that each thread’s reads and writes are visible to the other threads.

       !$acc parallel loop
       do p = 1,10
!$acc atomic update
         NPIZPL(3,3,3) = NPIZPL(3,3,3) + 1

         ZPLIST(3,3,3,NPIZPL(3,3,3)) = p
       enddo ! p

Though, this still isn’t correct in that the value of “NPIZPL(3,3,3)” could change before it’s used as the index. Hence, I’d use a temporary value to store the index. Something like:

       !$acc parallel loop
       do p = 1,10
!$acc atomic capture
         NPIZPL(3,3,3) = NPIZPL(3,3,3) + 1
         idx = NPIZPL(3,3,3)
!$acc end atomic

         ZPLIST(3,3,3,idx) = p
       enddo ! p

Note that you will need PGI version 14.9 or above since that’s when “capture” was first supported.

Hope this helps,
Mat

Hi Mat,

Many thanks for this!

Tim.

Hi Mat,

Sorry for getting back to you on this - after a very long break.

The code you sent me does not seem to work. I get a compiler message: undefined reference to ‘atomicaddi’

It looks to me like it cannot handle atomic capture on arrays but is capable of handling scalar variables. Does this make sense?

Could you offer any suggestions as to how to overcome this?

I am using version 14.10 by the look of it. Would it be worth me upgrading to the latest update (15.3?) ?

Many thanks

Tim.

Hi Tim,

‘atomicaddi’ is actually the host version of the routine. Your work around would be to compile just for the device, “-ta=tesla”, or add the library “-lcudadevice”.

We corrected this in 15.1 and add the library to the link by default.

  • Mat

Hi,

Thanks for the reply. I now have an issue where the compilation of the code hangs, rather than giving me an error message. A bit of testing would suggest that the compiler does not like the following lines:

     !$acc kernels         
       NPIZPL(:,:,:) = 0.00
       ZPLIST(:,:,:,:) = 0.00
       
      !$acc end kernels

Any suggestions?

Thanks in advance.

Hi Tim,

That should work so I suspect something else is going on. Can you post or send to PGI Customer Service (trs@prgoup.com) a reproducing example?

Thanks,
Mat

Hi,

I have sent a reproducing example to you.

I have also been able to get the code to compile by removing ‘-Mbounds’ from the compile command.

Thanks

Tim.