thread private common block

pilot117 · August 8, 2012, 9:32pm

If some common block is declared as thread private, could I use the openAcc to update the arrays defined in the common block?

I did some test, it seems that, whenever the common block is thread private, any updates by openAcc kernel will result in a runtime error:

line xxxx: cudaEventSynchronize returned status 4: unspecified launch failure

thanks

MatColgrove · August 8, 2012, 10:41pm

Hi pilot117,

There’s a couple of features not yet available. Threadprivate is one, host_data and device resident the others.

Mat

pilot117 · August 8, 2012, 11:08pm

Hi, Mat, many thanks for the answer. What do you mean “host_data and device resident the others”?

Btw, could you explain the reason a little bit more?

From my understanding: for the thread private common block, each thread will have one copy. When I do something like this:

 subroutine mysub() 
      integer N,i 
      parameter (N=1048576) 
      common/blk/t1(N),t2(N),t3(N) 
c$omp threadprivate (/blk/)
!$acc kernels 
!$acc loop 
      do i=1,N 
         t2(i)=t1(i)*t1(i)+t2(i)*t3(N) 
      enddo 
!$acc end kernels 
      return 
      end

will the openAcc see different copies of t2 or does the multiple copies cause the problem? It seems that reading t1, t2, t3 is ok. But its the writing that causes the problem. Even I only use 1 thread, I still got the same runtime error, which suggests that something mystery and deep inside the openAcc implementations. If you could explain it or direct me to some reference, that would be great!

MatColgrove · August 10, 2012, 4:08pm

Hi pilot117,

I didn’t realize you we’re meaning the OpenMP threadprivate. OpenACC also has a “threadprivate” directive but it’s still in development.

I’m away at a conference so will ask one of our other application engineer to investigate using an OpenMP threadprivate variable within an OpenACC compute region. I’ve not tried it before.

Mat

pilot117 · August 10, 2012, 4:19pm

Many thanks! Here I provide my example and compilation command in case you need a quick testing case:

      subroutine mysub() 
      integer N,i 
      parameter (N=1048576) 
      common/blk/t1(N),t2(N),t3(N) 
c$omp threadprivate (/blk/)
!$acc update device(t1,t2,t3) 
!$acc kernels 
!$acc loop 
      do i=1,N 
         t2(i)=t1(i)*t1(i)+t2(i)*t3(N) 
      enddo 
!$acc end kernels 
!$acc update host(t2) 
      return 
      end 

      program mainTest 
      integer N,i 
      parameter (N=1048576) 
      real t1(1:N),t2(1:N),t3(1:N) 
      common/blk/t1,t2,t3 

!$acc mirror(t1,t2,t3) 

      do i=1,N 
         CALL RANDOM_NUMBER(HARVEST=X) 
         t1(i)=X 
         CALL RANDOM_NUMBER(HARVEST=X) 
         t2(i)=X 
         CALL RANDOM_NUMBER(HARVEST=X) 
         t3(i)=X 
      enddo 
  

      do j=1,10 
      call mysub() 
      end do 

      end program mainTest

the compile command:

pgf90 -o test -acc -mp -ta=nvidia:cc2.0,time -Minfo=accel -Mcuda -Mvect simpleTest.f

Here are from my output:

pilot@mars:~/Codes/test$ ./test 
line 9: cudaEventSynchronize returned status 4: unspecified launch failure

Accelerator Kernel Timing data
/home/pilot/Codes/test/simpleTest.f
  mysub
    7: region entered 1 time
        time(us): init=0
                  data=12
        9: kernel launched 1 times
            grid: [8192]  block: [128]
            time(us): total=0 max=0 min=0 avg=0
/home/pilot/Codes/test/simpleTest.f
  mysub
    6: region entered 1 time
        time(us): init=0
                  data=3,441
/home/pilot/Codes/test/simpleTest.f
  maintest
    23: region entered 1 time
        time(us): init=135,083

new2CUDA · August 13, 2012, 5:32am

Hi Mat,

I am not using PGI Accelerator. I am programming in CUDA Fortran and openMP. Is ThreadPrivate feature supported in CUDA Fortran? I am using PGI version 12.6.

Thanks in advance.

new2CUDA · February 27, 2013, 12:20pm

Hi mkcolg,

Can I declare a device allocatable array as Threadprivate in CUDA Fortran? If yes, can you please show this through an example. I am using PGI version 12.9.

If not, how can I map a CPU ThreadPrivate declared Array to a GPU Array?
Any direct mechanism to do that?

Thanks in advance.

Topic		Replies	Views
OpenMP Threadprivate in CUDA Fortran Legacy PGI Compilers	1	2207	February 28, 2013
OpenACC "threadprivate"? Legacy PGI Compilers	3	3856	October 20, 2017
OpenACC on GPU help Legacy PGI Compilers	4	2307	April 20, 2018
Is there anything like "THREADPRIVATE" for OpenMP? Legacy PGI Compilers	3	3668	March 28, 2014
call to cuMemHostUnregister returned error 700: Launch faile Legacy PGI Compilers	3	3310	July 29, 2013
Code execution depends strangely on irrelevant parameters Legacy PGI Compilers	8	8159	October 22, 2013
Wrong results when using the private directive with PGI 12.6 Legacy PGI Compilers	9	6531	August 22, 2012
CUDA Fortran+Openmp problem Legacy PGI Compilers	9	1249	March 3, 2022
OPENACC changes value of array Legacy PGI Compilers	12	9809	May 17, 2016
"Private" arrays in ACC kernel. Legacy PGI Compilers	2	3201	May 29, 2013

thread private common block

Related topics