How can i translate to cuda fortran

yasminpoyraz · December 10, 2017, 9:11pm

Hello,
I want to translate these fortran code to cuda fortran.

subroutine polate(QLocal, Temperature, MoleculeIndex)

            implicit none
            integer :: i                                                                   
            integer :: MoleculeIndex                                                       
            real*8 :: Temperature                                                          
            real*8 :: logt, x0, x1, f0, f1                                                  
            real*8 :: QLocal                                                                

                Do i = Firsti, Lasti, stepi
                    if (TempPartFunc(i) == logt) then
                        QLocal = lgQ(i, MoleculeIndex)
                        exit
                    elseif (TempPartFunc(i) < logt) then
                        if (i == NumberOfTemperatures) then
                            x0 = TempPartFunc(i - 1)
                            x1 = TempPartFunc(i)
                            f0 = lgQ(i - 1, MoleculeIndex)
                            f1 = lgQ(i, MoleculeIndex)
                        else
                            x0 = TempPartFunc(i)
                            x1 = TempPartFunc(i + 1)
                            f0 = lgQ(i, MoleculeIndex)
                            f1 = lgQ(i + 1, MoleculeIndex)
                        endif
                        QLocal = f0 + (f1 - f0)/(x1 - x0) * (logt - x0)

                        exit
                    endif
                end Do
                QLocal = 10.d0**QLocal

            return
        end subroutine polate

I translate Firsti, Lasti, stepi, TempPartFunc, logt, lgQ global values to device value.

I write the kernel :

attributes(global) subroutine polateGPU(QLocal, Temperaturez, MoleculeIndexz)


			i = blockDim%x * (blockIdx%x - 1) + threadIdx%x



                
                Do i = Firsti_d, Lasti_d, stepi_d
                    if (TempPartFunc_d(i) == logt) then
                        QLocal = lgQ_d(i, MoleculeIndexz)
                        exit

                    elseif (TempPartFunc_d(i) < logt) then
                        if (i == NumberOfTemperatures_d) then
                            x0 = TempPartFunc_d(i - 1)
                            x1 = TempPartFunc_d(i)
                            f0 = lgQ_d(i - 1, MoleculeIndexz)
                            f1 = lgQ_d(i, MoleculeIndexz)
                        else
                            x0 = TempPartFunc_d(i)
                            x1 = TempPartFunc_d(i + 1)
                            f0 = lgQ_d(i, MoleculeIndexz)
                            f1 = lgQ_d(i + 1, MoleculeIndexz)
                        endif
                        QLocal = f0 + (f1 - f0)/(x1 - x0) * (logt - x0)

                    
                        exit
                    endif
                end Do
                QLocal = 10.d0**QLocal

              
            endif

            return
        end subroutine polateGPU

I think i integer value should be used as an threadindex
and i call the kernel like this:

tBlock = dim3(MaxNumThreads, 1, 1)
			grid = dim3(ceiling(real(NumberOfTemperatures)/tBlock%x), 1, 1)

call polateGPU<<< grid, tBlock >>>(Q_d(c, l_d), Temperature_d, MoleculeIndex_d)

But i getting always segmentation fault core dummed error. How can i fix this error?[/code]

MatColgrove · December 11, 2017, 7:09pm

Hi jasminpoyraz,

What are you trying to parallelize and what is goal with this code?

Typically to start, you’d make the body of a loop the body of the CUDA Fortran kernel with the loop index becoming the kernel’s launch configuration.

For this code I’d say start there, but the loop really isn’t parallelizable. It looks like your trying to find the first instance where “TempPartFunc(I)” <= logt and then setting QLocal. However, this causes a dependency in the loop. You could evaluate all iterations of the loop in parallel, but if you had multiple matching instances, you wouldn’t know which “i” QLocal took it’s value.

I’m thinking you may want to back-up a bit and take a look at some of the CUDA Fortran examples that ship with the compilers ($PGI/2017/examples/CUDA-Fortran) to see how to organize your code. You might want to read this article of the basics of CUDA Fortran (CUDA Fortran Introduction | PGI) and possibly get the book “CUDA Fortran for Scientist and Engineers” (CUDA Fortran for Scientists and Engineers - 1st Edition)

Note that a segmentation fault occurs on the host so not a problem with the kernel itself. I could probably help you track this down if can post the full source.

-Mat

Topic		Replies	Views
An Easy Introduction to CUDA Fortran Technical Blog	7	654	June 21, 2024
CudaFotran compiling problem When i am comipiling the cuda fortran code, type mismatch error is com CUDA Programming and Performance	13	3766	December 1, 2009
coce_parallel_loop Legacy PGI Compilers	1	1734	February 22, 2012
help me with my first CUDA Fortran program. Legacy PGI Compilers	4	8528	March 31, 2010
optimization errors in cuda fortran Legacy PGI Compilers	2	3200	November 19, 2011
CUDA Fortran and Fortran 77 Legacy PGI Compilers	13	8412	March 12, 2012
Fortran to CUDA Porting Biochemical code to GPU CUDA Programming and Performance	6	4776	February 10, 2012
Fortran codes in CUDA CUDA Programming and Performance	10	5908	July 16, 2009
MPIFORT + CUDA FORTRAN - Passing pointer from Fortran (MPIFORT) to CUDA Fortran (PGIF90) and allocating memory on device Legacy PGI Compilers cuda	2	1064	June 18, 2021
The output is wrong! it seems gpu doesnt do the work Legacy PGI Compilers	3	1522	October 31, 2018

How can i translate to cuda fortran

Related topics