Monte Carlo Tutorial- Incorrect results on specific platform

I believe I have 10.6 compilers (64-bit) where not specified.

In particular, the accelerator example is generating correct results on Machine 1 on Win-7 x86-64/ PVF 64-bit / GTX480 :

----- ACC -----
Result = 3.135702
Standard deviation = 5.0239276E-05
Difference from real PI value = 5.8903694E-03
Time in Seconds
Total : 20.59500
RNG : 16.11200
Compute : 0.49100
Data Xfer : 3.90600

But on Machine 2 with RHEL-5.5 (64-bit) / PGI-Acc Workstation 10.6 / C1060 and running ‘make run_ACC’

I get
----- ACC -----
Result = 9.8272569E-02
Standard deviation = 1.0814508E-04
Difference from real PI value = 3.043320
Time in Seconds
Total : 16.73515
RNG : 15.06501
Compute : 0.23321
Data Xfer : 1.36283

I used the default settings given with the Makefile. All other examples are generating proper results.

Are there other compiler settings I should try specific to Tesla C1060 in order to get the accelerator example to work?

I am using pgi_mc_example.tar.gz files; I did not make any changes to them.

Regards,
Kevin

PS All the other examples Cuda C/Fortran work fine on both machines.

Is this related to implementation of reductions?

It seems that with my best efforts, I could do the following and obtain correct results for the M.C. integration:

! ... partial mcACC.f90 listing

! this does not work !$acc data region local(temp), copyin(X,Y)

!$acc data region copyin(X,Y)

    call cpu_time(func_start)
    results%time(5) = results%time(5) &
                    + (func_start-datat)

!$acc region 

    do i=1,N
      tempVal = X(i)*X(i) + Y(i)*Y(i)
      if (tempVal < 1) then
         temp(i) = 1
      else
         temp(i) = 0
      endif
    enddo

!$acc end region

!$acc end data region

    do i=1,N
      sumA = sumA + temp(i)
      sumSq = sumSq + (temp(i)*temp(i))
    enddo

    call cpu_time(sum_end)

! ... end partial listing of mcACC.f90

I ran the above code on Intel Xeon E5405 & C1060 (1 of 4 with ACC_DEVICE_NUM set to 0)

  1. The use of local(temp) is giving me problems when used in the $ACC directive

  2. $ACC region around the sum reduction also gave me improper results.

Perhaps there is another environment variable I’m not setting right? I did not have problems with the above source code when using GTX 480 card with PVF 10.6 on another computer.

Regards,
Kevin

Hi Kevin,

I’m a bit embarrassed that this is my code and I didn’t catch this compiler error myself. The error does appear to be new in the 10.6 compiler. I’ll submit a bug report to get it fixed.

The problem is indeed the reduction, but it has to with the mixing of both reductions in the same do loop. To work around the error, I would recommend splitting the sumSq into it’s own do loop.

!$acc region
    do i=1,N
      tempVal = X(i)*X(i) + Y(i)*Y(i)
      if (tempVal < 1) then
         temp(i) = 1
      else
         temp(i) = 0
      endif
    enddo

    do i=1,N
      sumA = sumA + temp(i)
    enddo
    do i=1,N
      sumSq = sumSq + (temp(i)*temp(i))
    enddo
!$acc end region

Thanks,
Mat

FYI, this error (TPR#17105) will be fixed in the 10.8 release.

  • Mat