Running a loop in a kernal(CUDA Fortran)

Hello everyone,

I recently started working on CUDA Fortran. I got a basic gist of the working principle of the syntax and GPU. I started converting my existing serial code into a GPU code. In this regard, I had a small doubt:

So this is my global subroutine:

attributes(global) subroutine fpi_solver(prim_d, q_d, dq_d, flux_res_d, x_d, nbhs_d, conn_d)

Later i get the index:

i = (blockIdx%x-1)* blockDim%x + threadIdx%x

Now I have a loop operation for this point i over its neighbors,

do k = 1, neighbours(i)
             sum = sum + value(k)
end do

I directly tried to use the loop but for some reason, this kernel is not getting invoked. if i comment the loop it does. I would like to apologize for my ignorance but is there something I am missing? Do you have any suggestions as to how I can proceed with this problem?

Thank you very much,

Srikanth

I directly tried to use the loop but for some reason, this kernel is not getting invoked.

Is it not getting invoked or is it erroring?

CUDA kernels will fail silently so it’s a good idea to catch errors after the kernel launch. Try adding something like the following:

call  fpi_solver<<<...>>>(..args..) 
istat = cudaGetLastError() 
if (istat .ne. 0) 
   print*, cudaGetErrorString(istat)
   stop istat
endif

I suspect that it’s a problem with “neighbours(i)”. You might be going out of bounds (are you checking that “i” is a valid index?), or it’s values aren’t getting updated on the device, hence contains garbage.

If you can post a reproducing example, that might help with spotting the error.

-Mat

Hi Mat,

Thanks for the prompt reply. I got a too many resources requested for launch error. I had to reduce to the number of threads per block, now it is working.

Thanks for all the help and support, you guys are amazing.

You’re very welcome and thanks for the compliment!

I’m not really sure why commenting out that one loop work around this issue, but glad you were able to determine the actual cause of the error.