Survey for PGI FORTRAN compiler ï¼Thanks~

yaoyaoyaoyao · June 27, 2010, 6:55am

Hiï¼Œeveryone!

I want to buy a PGI FORTRAN compiler recently to accelerate the cuda for my major always using fortran to programm instead of C .

So I want to kown the acceleration effect of PGI FORTRAN compiler comparing to CUDA-C.

Is there someone who have used PGI fortran and have some example to accelerate?

Thanks very much for your kind answer!

----a student form China

MMB · June 29, 2010, 12:27pm

You can get a 30-day free trial from PGI. Why not take advantage of it, and see for yourself? Let us know what you find, please!

MMB

MMB · June 29, 2010, 12:27pm

You can get a 30-day free trial from PGI. Why not take advantage of it, and see for yourself? Let us know what you find, please!

MMB

TheMatt · June 29, 2010, 6:20pm

I use PGI’s CUDA Fortran in my work and I can share a bit. First, I’ll rightfully admit I’m not much of a CUDA C programmer. I’ve done a bit with it (the usual matmul examples and all) and can read it, but I have been and always shall be a Fortran programmer at heart. This is mainly due to the fact that my graduate work, postdoctoral work, and current work all deal with codes that are FORTRAN 77 to Fortran 2003.

But, that said, I can say I find programming in CUDA Fortran quite nice. If you are from a Fortran background and are used to arrays rather than pointers and the use of MODULEs, I think you’ll like it too. If nothing else, there is an ease of reading it. For example, let’s say you have a device array dev_work(1024) and a host array of host_work(1024), instead of doing a cudaMemcpy(dev_work,host_work,1024), you can just say dev_work=host_work. Same for moving values to constant memory. (Note though, you can still use the cudaMemcpy API style calls if you want. Most of them are supported, like, say, cudaMemcpy2DAsync.)

Likewise, allocating memory on the device is as simple as using allocate() instead of cudaMalloc and deallocate() rather than cudaFree. And depending on how you defined your host variable, when you allocate on the host, it’ll allocate pinned/page-locked memory for you as well:

real, pinned, allocatable, dimension(:) :: host_work

is allocated as pinned. (Again, though, if you need/want to use cudaMallocPitch, you can!)

Indeed, most of what you want to do in CUDA C, you can do in CUDA Fortran. A few things like Textures haven’t been implemented yet, but most of the language is there, and if you find something that isn’t, PGI is usually pretty good at getting it implemented in a future release.

I’ll also say that I’ve done some programming with PGI’s Accelerator pragmas (think OpenMP style pragmas but for GPUs), and have found them to be quite good. In recent times, the Accelerator model can identify and correctly code for you things like reductions, which are a bit tricky to program in CUDA C/Fortran.

For more on what programming with PGI compilers and GPUs can look like, I recommend reading some of the PGI Insider articles. For example, this article on porting WRF shows that the pragma model can essentially match hand-tuned CUDA C code with what I’m sure was much less effort and that was with a compiler almost a year old.

TheMatt · June 29, 2010, 6:20pm

I use PGI’s CUDA Fortran in my work and I can share a bit. First, I’ll rightfully admit I’m not much of a CUDA C programmer. I’ve done a bit with it (the usual matmul examples and all) and can read it, but I have been and always shall be a Fortran programmer at heart. This is mainly due to the fact that my graduate work, postdoctoral work, and current work all deal with codes that are FORTRAN 77 to Fortran 2003.

But, that said, I can say I find programming in CUDA Fortran quite nice. If you are from a Fortran background and are used to arrays rather than pointers and the use of MODULEs, I think you’ll like it too. If nothing else, there is an ease of reading it. For example, let’s say you have a device array dev_work(1024) and a host array of host_work(1024), instead of doing a cudaMemcpy(dev_work,host_work,1024), you can just say dev_work=host_work. Same for moving values to constant memory. (Note though, you can still use the cudaMemcpy API style calls if you want. Most of them are supported, like, say, cudaMemcpy2DAsync.)

Likewise, allocating memory on the device is as simple as using allocate() instead of cudaMalloc and deallocate() rather than cudaFree. And depending on how you defined your host variable, when you allocate on the host, it’ll allocate pinned/page-locked memory for you as well:

real, pinned, allocatable, dimension(:) :: host_work

is allocated as pinned. (Again, though, if you need/want to use cudaMallocPitch, you can!)

Indeed, most of what you want to do in CUDA C, you can do in CUDA Fortran. A few things like Textures haven’t been implemented yet, but most of the language is there, and if you find something that isn’t, PGI is usually pretty good at getting it implemented in a future release.

I’ll also say that I’ve done some programming with PGI’s Accelerator pragmas (think OpenMP style pragmas but for GPUs), and have found them to be quite good. In recent times, the Accelerator model can identify and correctly code for you things like reductions, which are a bit tricky to program in CUDA C/Fortran.

For more on what programming with PGI compilers and GPUs can look like, I recommend reading some of the PGI Insider articles. For example, this article on porting WRF shows that the pragma model can essentially match hand-tuned CUDA C code with what I’m sure was much less effort and that was with a compiler almost a year old.

KarlW · June 30, 2010, 6:41am

Hi,

I have been using the PGU Accelerate compiler rather than CUDA FORTRAN but I thought I would add my thoughts. I must admit I haven’t used CUDA itself much so I can’t really draw any comparisons there.

Aside from (sometimes, usually the compiler will pick out the optimal option for you) having to explicitly state what you want to do with arrays and taking care of the scheduling there is nothing much that you need to be concerned about in terms of a step up from regular FORTRAN, and those issues are common to all accelerator languages.
I also find the option to easily remove the accelerator pragmas useful for debugging, but I am not sure what the situation is with other options.

I can’t say anything definitive in terms of a comparison between the speedup achieved by the code as I don’t have a CUDA version to compare to. However, the results compared to a similar software package are currently very promising. That combined with the ease of sticking to a language I know make this the best choice for me.

Also, as TheMatt said, PGI are very good at implementing feature requests and also provide excellent support.

I hope this is of some help,

Karl

KarlW · June 30, 2010, 6:41am

Hi,

I have been using the PGU Accelerate compiler rather than CUDA FORTRAN but I thought I would add my thoughts. I must admit I haven’t used CUDA itself much so I can’t really draw any comparisons there.

Aside from (sometimes, usually the compiler will pick out the optimal option for you) having to explicitly state what you want to do with arrays and taking care of the scheduling there is nothing much that you need to be concerned about in terms of a step up from regular FORTRAN, and those issues are common to all accelerator languages.
I also find the option to easily remove the accelerator pragmas useful for debugging, but I am not sure what the situation is with other options.

I can’t say anything definitive in terms of a comparison between the speedup achieved by the code as I don’t have a CUDA version to compare to. However, the results compared to a similar software package are currently very promising. That combined with the ease of sticking to a language I know make this the best choice for me.

Also, as TheMatt said, PGI are very good at implementing feature requests and also provide excellent support.

I hope this is of some help,

Karl

lianggu · July 27, 2010, 4:33pm

PGI Accelerate compiler is only suitable for simple loops.

You are not supposed to have function calls inside the loop.

The loops can’t be deeply nested or have loop carried dependence.

You can’t use pinned memory, can’t use atomic operation and streaming.

Basically, you have no control over anything.

Hi,

I have been using the PGU Accelerate compiler rather than CUDA FORTRAN but I thought I would add my thoughts. I must admit I haven’t used CUDA itself much so I can’t really draw any comparisons there.

Aside from (sometimes, usually the compiler will pick out the optimal option for you) having to explicitly state what you want to do with arrays and taking care of the scheduling there is nothing much that you need to be concerned about in terms of a step up from regular FORTRAN, and those issues are common to all accelerator languages.

I also find the option to easily remove the accelerator pragmas useful for debugging, but I am not sure what the situation is with other options.

I can’t say anything definitive in terms of a comparison between the speedup achieved by the code as I don’t have a CUDA version to compare to. However, the results compared to a similar software package are currently very promising. That combined with the ease of sticking to a language I know make this the best choice for me.

Also, as TheMatt said, PGI are very good at implementing feature requests and also provide excellent support.

I hope this is of some help,

Karl

Topic		Replies	Views
Translating FORTRAN to C++ to CUDA advice CUDA Programming and Performance	19	23247	February 1, 2010
CUDA FORTRAN compiler CUDA Programming and Performance	3	1762	January 10, 2010
advice needed by a PhD student CUDA Programming and Performance	26	2844	December 4, 2011
CUDA Fortran and PGI Accelerator mix Legacy PGI Compilers	8	6109	May 20, 2011
CUDA Fortran Error Legacy PGI Compilers cuda	2	736	July 31, 2020
About PGI Fortran and CUDA 4.0 Legacy PGI Compilers	24	12984	August 12, 2011
CudaFotran compiling problem When i am comipiling the cuda fortran code, type mismatch error is com CUDA Programming and Performance	13	3611	December 1, 2009
Using MAGMA Legacy PGI Compilers	30	49066	May 4, 2015
OpenACC with cuBLAS and cuSPARSE in Fortran code Legacy PGI Compilers	7	8439	February 22, 2016
Problems with FORTRAN Accelerator and subroutines Legacy PGI Compilers	21	11922	August 17, 2011

Survey for PGI FORTRAN compiler ï¼Thanks~

Related topics

Survey for PGI FORTRAN compiler ï¼Thanks~