The output is wrong! it seems gpu doesnt do the work

oscar_ml · October 30, 2018, 4:01pm

Hi everybody

I am giving my first steps with cuda fortran but I am getting a problem with the following simple code incTest.cuf

module simpleOps_m
contains
  attributes(global) subroutine inc(a,b)
    implicit none
    integer :: a(:)
    integer, value :: b
    integer :: i, n

     i= threadIdx%x
       a(i)=a(i)+b

  end subroutine inc
end module simpleOps_m

program incTest
  use cudafor
  use simpleOps_m
  implicit none
  integer :: b, n=256
  integer, allocatable :: a(:)
  integer, allocatable, device :: a_d(:)

  allocate (a(n), a_d(n))
  a=1
  b=3
  
  a_d=a
  call inc<<<1,n>>>(a_d,b)
  a=a_d
 
  if (all(a==4)) &
     write(*,*) 'Test Passed'
  deallocate (a,a_d)

end program incTest

I compiled the code with pgf90 -Minfo -o exe incTest.cuf and I got the following
inctest:
33, all reduction inlined

but when I ran it I didn’t get the message ‘Test Passed’. Before I tested an even simpler code copydat.cuf and it worked

program copyData
  use cudafor
  implicit none
  integer, parameter :: n=256
  real :: a(n), b(n)
  real, device :: a_d(n),b_d(n)

  a=1.0
  a_d=a
  b_d=a_d
  b= b_d

  if (all(a==b)) &
    write(*,*) 'Test passed'
end program copyData

What test can I do to know what is the problem

Thanks for your time

MatColgrove · October 30, 2018, 9:39pm

Hi oscar_ml,

Most likely your kernel is failing for some reason. CUDA won’t throw and error, so instead it’s best to add error checking after each kernel. (See below).

What device are you using? I suspect that you might be on a newer or older than device than one of the default targets for your compiler version. For example with PGI 18.4, we target cc35, cc50, and cc60 since we default to using CUDA 8.0. Later compiler versions will also add cc70 as well as use CUDA 9.0.

You may need to set the target compute capability for your device such as “-Mcuda=cc70” for a V100.

Note that in our 18.10 release, we’ll default to target the device that’s detected on the compiling system.

% cat test.cuf
module simpleOps_m
contains
  attributes(global) subroutine inc(a,b)
    implicit none
    integer :: a(:)
    integer, value :: b
    integer :: i, n

     i= threadIdx%x
       a(i)=a(i)+b

  end subroutine inc
end module simpleOps_m

program incTest
  use cudafor
  use simpleOps_m
  implicit none
  integer :: b, n=256, rc
  integer, allocatable :: a(:)
  integer, allocatable, device :: a_d(:)

  allocate (a(n), a_d(n))
  a=1
  b=3

  a_d=a
  call inc<<<1,n>>>(a_d,b)
  rc = cudaGetLastError()
  if (rc.ne.0) then
   print *, "Error: ", cudaGetErrorString(rc)
  endif
  a=a_d

  if (all(a==4)) &
     write(*,*) 'Test Passed'
  deallocate (a,a_d)

end program incTest
% pgf90 test.cuf -V18.4 -Mcuda=cc35
% a.out
 Error:
 invalid device function
% pgf90 test.cuf -V18.4 -Mcuda=cc70
% a.out
 Test Passed

Hope this helps,
Mat

oscar_ml · October 31, 2018, 12:05pm

Mat

Many thanks for your reply. You were right. I verified my compiler version

%pgf90 --versionpgf90 
18.4-0 64-bit target on x86-64 Linux -tp sandybridge 
PGI Compilers and Tools
Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.

and my device, which is a quadro K600. Then I compiled the code with the flag -Mcuda=cc30 and got the right answer

% pgf90 test.cuf -V18.4 -Mcuda=cc70
% a.out
 Test Passed

It was a silly mistake. Next time I will come with a more challenging question ;)

BTW: Do you have some link with the heat example using cuda fortran. It will be very useful to write my app

MatColgrove · October 31, 2018, 2:51pm

BTW: Do you have some link with the heat example using cuda fortran. It will be very useful to write my app

We do have a laplace example as part of our OpenACC training, but not for CUDA Fortran.

In Chapter 8 from the example sources from the Parallel Programming with OpenACC, there are examples of a 2D-heat program in both OpenACC and CUDA C.

They might help show at least the algorithm and code structure that you could translate in to CUDA Fortran.

Topic		Replies	Views
A value sended by host not return correctly by device using Legacy PGI Compilers	2	3757	April 13, 2017
Code Ends without thread execution Legacy PGI Compilers	1	2805	May 26, 2015
optimization errors in cuda fortran Legacy PGI Compilers	2	3196	November 19, 2011
Problem with CUDA fortran simple program Legacy PGI Compilers	4	9158	February 11, 2010
Simple MPI CUDA Fortran Test Issue CUDA Programming and Performance	4	98	July 4, 2025
A value sended by host not return correctly by device using CUDA Fortran CUDA Programming and Performance	2	674	April 11, 2017
Cuda fortran doesnt launch subroutines containing gpu code Legacy PGI Compilers	3	2444	May 26, 2018
Different results with -Mcuda=emu / -Mcuda with simple code Legacy PGI Compilers	17	15431	December 10, 2009
help me with my first CUDA Fortran program. Legacy PGI Compilers	4	8515	March 31, 2010
GPU module not executing Legacy PGI Compilers	2	3644	February 27, 2017

The output is wrong! it seems gpu doesnt do the work

Related topics