Problems with shared memory in cuda fortran

Hello All,

I am having issues when using shared memory arrays in fortran. If someone can look at the code and identify the errors in the code, that would be great.
I uploaded the files to mediafire.

In folder v1, I didn’t use shared memory and the code is working fine.

To get more performance, I changed the structure of code and used 2 dimensional blocks and with that I started using shared memory. But the code is giving garbage results. Logically the code looks sound, but I am making some mistake unknown to me. The modified code is in folder v2

The troublesome files are xi.90 and enrgy_eval.f90. And when I am mixing xi and enrgy_eval files from the two folders, I am getting memory errors.

Any help is greatly appreciated

Hi Bharat,

Adding error checking:

        call xi2_kernel<<<dimGrid>>>(xd,yd,zd,WLKR,NCL_NO,wf2d)
        istat = cudaGetLastError()
        print *, cudaGetErrorString(istat)

I get the following runtime error:

 too many resources requested for launch

The “-Mcuda=ptxinfo” flag shows that you’re using 37 registers per thread. Since you have 1024 threads per block, the total register usage is 37888. The maximum number of registers on a C2060 is 32768.

To fix you either need to reduce the number of threads per block, or use the flag “-Mcuda=maxregcount:35” to reduce the number of registers per block.

Hope this helps,

Hello Mat,

Thanks for the info.