How can i translate to cuda fortran

I want to translate these fortran code to cuda fortran.

subroutine polate(QLocal, Temperature, MoleculeIndex)

            implicit none
            integer :: i                                                                   
            integer :: MoleculeIndex                                                       
            real*8 :: Temperature                                                          
            real*8 :: logt, x0, x1, f0, f1                                                  
            real*8 :: QLocal                                                                

                Do i = Firsti, Lasti, stepi
                    if (TempPartFunc(i) == logt) then
                        QLocal = lgQ(i, MoleculeIndex)
                    elseif (TempPartFunc(i) < logt) then
                        if (i == NumberOfTemperatures) then
                            x0 = TempPartFunc(i - 1)
                            x1 = TempPartFunc(i)
                            f0 = lgQ(i - 1, MoleculeIndex)
                            f1 = lgQ(i, MoleculeIndex)
                            x0 = TempPartFunc(i)
                            x1 = TempPartFunc(i + 1)
                            f0 = lgQ(i, MoleculeIndex)
                            f1 = lgQ(i + 1, MoleculeIndex)
                        QLocal = f0 + (f1 - f0)/(x1 - x0) * (logt - x0)

                end Do
                QLocal = 10.d0**QLocal

        end subroutine polate

I translate Firsti, Lasti, stepi, TempPartFunc, logt, lgQ global values to device value.

I write the kernel :

attributes(global) subroutine polateGPU(QLocal, Temperaturez, MoleculeIndexz)

			i = blockDim%x * (blockIdx%x - 1) + threadIdx%x

                Do i = Firsti_d, Lasti_d, stepi_d
                    if (TempPartFunc_d(i) == logt) then
                        QLocal = lgQ_d(i, MoleculeIndexz)

                    elseif (TempPartFunc_d(i) < logt) then
                        if (i == NumberOfTemperatures_d) then
                            x0 = TempPartFunc_d(i - 1)
                            x1 = TempPartFunc_d(i)
                            f0 = lgQ_d(i - 1, MoleculeIndexz)
                            f1 = lgQ_d(i, MoleculeIndexz)
                            x0 = TempPartFunc_d(i)
                            x1 = TempPartFunc_d(i + 1)
                            f0 = lgQ_d(i, MoleculeIndexz)
                            f1 = lgQ_d(i + 1, MoleculeIndexz)
                        QLocal = f0 + (f1 - f0)/(x1 - x0) * (logt - x0)

                end Do
                QLocal = 10.d0**QLocal


        end subroutine polateGPU

I think i integer value should be used as an threadindex
and i call the kernel like this:

tBlock = dim3(MaxNumThreads, 1, 1)
			grid = dim3(ceiling(real(NumberOfTemperatures)/tBlock%x), 1, 1)

call polateGPU<<< grid, tBlock >>>(Q_d(c, l_d), Temperature_d, MoleculeIndex_d)

But i getting always segmentation fault core dummed error. How can i fix this error?[/code]

Hi jasminpoyraz,

What are you trying to parallelize and what is goal with this code?

Typically to start, you’d make the body of a loop the body of the CUDA Fortran kernel with the loop index becoming the kernel’s launch configuration.

For this code I’d say start there, but the loop really isn’t parallelizable. It looks like your trying to find the first instance where “TempPartFunc(I)” <= logt and then setting QLocal. However, this causes a dependency in the loop. You could evaluate all iterations of the loop in parallel, but if you had multiple matching instances, you wouldn’t know which “i” QLocal took it’s value.

I’m thinking you may want to back-up a bit and take a look at some of the CUDA Fortran examples that ship with the compilers ($PGI/2017/examples/CUDA-Fortran) to see how to organize your code. You might want to read this article of the basics of CUDA Fortran ( and possibly get the book “CUDA Fortran for Scientist and Engineers” (

Note that a segmentation fault occurs on the host so not a problem with the kernel itself. I could probably help you track this down if can post the full source.