CUDA Fortran and PGI Accelerator mix

BL_user · May 17, 2011, 9:42pm

Greetings. Is the mix of CUDA Fortran (-Mcuda) with the PGI Accelerator Model (-ta=nvidia) supported? I saw a post from April 2010 that they shouldn’t be used together, but at some point they may.

Thanks
BL

MatColgrove · May 18, 2011, 3:10pm

Hi BL,

Yes, they are now supported together. At one point they were using different CUDA APIs but we have since merged them so that they are now compatible on all platforms (I did work on Linux before but not on Windows). Note that the accelerator directives do recognize CUDA Fortran device variables so don’t copy these variables. We also added a “!$CUF” directive to CUDA Fortran (See:Account Login | PGI) which is essentially a ‘lite’ version of the PGI Accelerator model. It does not automate data movement but does create device kernels for you. It also uses the CUDA chevron syntax to give you control of the loop schedule.

Hope this helps,
Mat

BL_user · May 18, 2011, 8:21pm

Thanks! That’s great to hear.
I tried to test mixing cuda fortran and pgi accelerator directives. The code shown compiles fine but I get an error at runtime. I’m using Windows.

program fft_test
use cudafor
use precision
use cufft
complex(fp_kind) ,allocatable:: a(:),b(:),c(:)
complex(fp_kind),device,allocatable:: a_d(:),b_d(:)
integer:: n
integer:: plan

n=8

! allocate arrays on the host
allocate (a(n),b(n),c(n))

! allocate arrays on the device
allocate (a_d(n))
allocate (b_d(n))

!initialize arrays on host
a=1;c=0

!copy arrays to device
a_d=a


! Print initial array
print *, "Array A:"
print *, a



! Initialize the plan
call cufftPlan1D(plan,n,CUFFT_Z2Z,1)

! Execute FFTs
call cufftExecZ2Z(plan,a_d,b_d,CUFFT_FORWARD)

!call cufftExecZ2Z(plan,b_d,b_d,CUFFT_INVERSE)


! Copy results back to host
b=b_d

! Print initial array
print *, "Array B"
print *, b

! Add arrays
!$acc region
do j=1,n
c(j)=a(j)+b(j)
enddo
!$acc end region
print *, "Array C"
print *, c

!release memory on the host
deallocate (a,b,c)

!release memory on the device
deallocate (a_d,b_d)

! Destroy the plan
call cufftDestroy(plan)

end program fft_test

This is the compile output

>pgf90 precision.f90 cufft.f90 fft_test.f90 -o main -Mcuda=cuda3.2 -ta=nvidia:cuda3.2 -Minfo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\x64\cufft.lib"
precision.f90:
cufft.f90:
fft_test.f90:
fft_test:
     20, Memory set idiom, array assignment replaced by call to pgf90_msetz16
         Memory zero idiom, array assignment replaced by call to pgf90_mzeroz16
     49, Generating copyin(b(1:8))
         Generating copyin(a(1:8))
         Generating copyout(c(1:8))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     50, Loop is parallelizable
         Accelerator kernel generated
         50, !$acc do parallel, vector(8) ! blockidx%x threadidx%x
             CC 1.3 : 11 registers; 52 shared, 4 constant, 0 local memory bytes;
 25% occupancy
             CC 2.0 : 18 registers; 4 shared, 64 constant, 0 local memory bytes;
 16% occupancy

This is the output

 Array A:
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 (1.000000000000000,0.000000000000000)  (1.000000000000000,0.000000000000000)
 Array B
 (8.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
 (0.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
 (0.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
 (0.000000000000000,0.000000000000000)  (0.000000000000000,0.000000000000000)
call to cuMemAlloc returned error 201: Invalid context
CUDA driver version: 3020

This error seems to occur when the accelerated region is entered. What does this error mean? I first compiled without specifying cuda3.2 and thought that was causing a mismatch.

Regards
BL

MatColgrove · May 18, 2011, 9:07pm

Hi BL,

You’re missing an interface for the CUFFT routines. Without an interface, the compiler must treat the calls using F77 calling semantics which are incorrect here.

Take a look at this article from the latest PGInsider (Account Login | PGI), which shows how to call the CUBLAS, CULA, and Magma BLAS libraries. The same methods can be used to call CUFFT.

Hope this helps,
Mat

BL_user · May 19, 2011, 2:16pm

Hello Mat,
I am using an interface for the CUFFT library. The output for array b shows that the call to the cufft routine was successful. Array b is the transform of array a. It is when the program enters the !$acc region that I get the error “call to cuMemAlloc returned error 201: Invalid context”.

Is this still due to an interface problem?

Thanks
BL

MatColgrove · May 19, 2011, 4:32pm

Hi BL,

I am using an interface for the CUFFT library.

Sorry, I was in a rush yesterday and missed the ‘use cufft’.

Is this still due to an interface problem?

Probably not but I would need to investigate further to determine the actual problem. Let me try to reproduce the error and see what I can determine.

Mat

MatColgrove · May 20, 2011, 8:58pm

Hi BL,

It appears that you’re using the example CUDA Fortran calling CUFFT code found the CUDA Musing blog (CUDA Musing: Calling CUFFT from Cuda Fortran). Using the same cufft module and your modified source, I was able to build and run the exe with both CUDA and the PGI Accelerator model enabled. Unfortunately, the code ran correctly and I did not see the reported error.

Most likely it’s a problem with your CUDA device driver. Do you mind trying to update to the latest version for your device?

Mat

BL_user · May 20, 2011, 10:41pm

Thanks Mat. Are you referring to CUDA v4.0?

Regards
BL

MatColgrove · May 20, 2011, 11:01pm

Are you referring to CUDA v4.0?

You can find the latest CUDA 3.2 development drivers here: http://developer.nvidia.com/cuda-toolkit-32-downloads. The CUDA 4.0 development drivers should work as well but they are still in pre-release (http://developer.nvidia.com/cuda-toolkit-40)

Mat

Topic		Replies	Views
runtime error when use mpi, cuda fortran and CULA together Legacy PGI Compilers	2	3875	July 21, 2011
CUDA Fortran Error Legacy PGI Compilers cuda	2	736	July 31, 2020
CUDA Fortran, PGI 11.8, and gcc 4.5 Legacy PGI Compilers	11	8923	August 23, 2012
Using Fortran 90 function in CUDA kernel (Visual Studio 2010) CUDA Setup and Installation	3	2574	March 18, 2016
An Easy Introduction to CUDA Fortran Technical Blog	7	565	June 21, 2024
Survey for PGI FORTRAN compiler ï¼Thanks~ CUDA Programming and Performance	7	12483	July 27, 2010
First try compile errors Legacy PGI Compilers	15	14344	August 29, 2013
CUBLAS/CUFFT with Driver API CUDA Programming and Performance	8	5223	May 23, 2010
Modified version of the CUFFT example Legacy PGI Compilers	6	7635	February 21, 2012
FFT library Legacy PGI Compilers	10	7623	May 16, 2011

CUDA Fortran and PGI Accelerator mix

Related topics