How can i translate to cuda fortran

Hello,
I want to translate these fortran code to cuda fortran.

subroutine polate(QLocal, Temperature, MoleculeIndex)

            implicit none
            integer :: i                                                                   
            integer :: MoleculeIndex                                                       
            real*8 :: Temperature                                                          
            real*8 :: logt, x0, x1, f0, f1                                                  
            real*8 :: QLocal                                                                

                Do i = Firsti, Lasti, stepi
                    if (TempPartFunc(i) == logt) then
                        QLocal = lgQ(i, MoleculeIndex)
                        exit
                    elseif (TempPartFunc(i) < logt) then
                        if (i == NumberOfTemperatures) then
                            x0 = TempPartFunc(i - 1)
                            x1 = TempPartFunc(i)
                            f0 = lgQ(i - 1, MoleculeIndex)
                            f1 = lgQ(i, MoleculeIndex)
                        else
                            x0 = TempPartFunc(i)
                            x1 = TempPartFunc(i + 1)
                            f0 = lgQ(i, MoleculeIndex)
                            f1 = lgQ(i + 1, MoleculeIndex)
                        endif
                        QLocal = f0 + (f1 - f0)/(x1 - x0) * (logt - x0)

                        exit
                    endif
                end Do
                QLocal = 10.d0**QLocal

            return
        end subroutine polate

I translate Firsti, Lasti, stepi, TempPartFunc, logt, lgQ global values to device value.

I write the kernel :

attributes(global) subroutine polateGPU(QLocal, Temperaturez, MoleculeIndexz)


			i = blockDim%x * (blockIdx%x - 1) + threadIdx%x



                
                Do i = Firsti_d, Lasti_d, stepi_d
                    if (TempPartFunc_d(i) == logt) then
                        QLocal = lgQ_d(i, MoleculeIndexz)
                        exit

                    elseif (TempPartFunc_d(i) < logt) then
                        if (i == NumberOfTemperatures_d) then
                            x0 = TempPartFunc_d(i - 1)
                            x1 = TempPartFunc_d(i)
                            f0 = lgQ_d(i - 1, MoleculeIndexz)
                            f1 = lgQ_d(i, MoleculeIndexz)
                        else
                            x0 = TempPartFunc_d(i)
                            x1 = TempPartFunc_d(i + 1)
                            f0 = lgQ_d(i, MoleculeIndexz)
                            f1 = lgQ_d(i + 1, MoleculeIndexz)
                        endif
                        QLocal = f0 + (f1 - f0)/(x1 - x0) * (logt - x0)

                    
                        exit
                    endif
                end Do
                QLocal = 10.d0**QLocal

              
            endif

            return
        end subroutine polateGPU

I think i integer value should be used as an threadindex
and i call the kernel like this:

tBlock = dim3(MaxNumThreads, 1, 1)
			grid = dim3(ceiling(real(NumberOfTemperatures)/tBlock%x), 1, 1)

call polateGPU<<< grid, tBlock >>>(Q_d(c, l_d), Temperature_d, MoleculeIndex_d)

But i getting always segmentation fault core dummed error. How can i fix this error?[/code]

Hi jasminpoyraz,

What are you trying to parallelize and what is goal with this code?

Typically to start, you’d make the body of a loop the body of the CUDA Fortran kernel with the loop index becoming the kernel’s launch configuration.

For this code I’d say start there, but the loop really isn’t parallelizable. It looks like your trying to find the first instance where “TempPartFunc(I)” <= logt and then setting QLocal. However, this causes a dependency in the loop. You could evaluate all iterations of the loop in parallel, but if you had multiple matching instances, you wouldn’t know which “i” QLocal took it’s value.

I’m thinking you may want to back-up a bit and take a look at some of the CUDA Fortran examples that ship with the compilers ($PGI/2017/examples/CUDA-Fortran) to see how to organize your code. You might want to read this article of the basics of CUDA Fortran (http://www.pgroup.com/blogs/posts/cuda-fortran-intro.htm) and possibly get the book “CUDA Fortran for Scientist and Engineers” (https://www.elsevier.com/books/cuda-fortran-for-scientists-and-engineers/ruetsch/978-0-12-416970-8)

Note that a segmentation fault occurs on the host so not a problem with the kernel itself. I could probably help you track this down if can post the full source.

-Mat