Running a loop in a kernal(CUDA Fortran)

srikanthcs05 · January 24, 2019, 1:24pm

Hello everyone,

I recently started working on CUDA Fortran. I got a basic gist of the working principle of the syntax and GPU. I started converting my existing serial code into a GPU code. In this regard, I had a small doubt:

So this is my global subroutine:

attributes(global) subroutine fpi_solver(prim_d, q_d, dq_d, flux_res_d, x_d, nbhs_d, conn_d)

Later i get the index:

i = (blockIdx%x-1)* blockDim%x + threadIdx%x

Now I have a loop operation for this point i over its neighbors,

do k = 1, neighbours(i)
             sum = sum + value(k)
end do

I directly tried to use the loop but for some reason, this kernel is not getting invoked. if i comment the loop it does. I would like to apologize for my ignorance but is there something I am missing? Do you have any suggestions as to how I can proceed with this problem?

Thank you very much,

Srikanth

MatColgrove · January 24, 2019, 6:38pm

I directly tried to use the loop but for some reason, this kernel is not getting invoked.

Is it not getting invoked or is it erroring?

CUDA kernels will fail silently so it’s a good idea to catch errors after the kernel launch. Try adding something like the following:

call  fpi_solver<<<...>>>(..args..) 
istat = cudaGetLastError() 
if (istat .ne. 0) 
   print*, cudaGetErrorString(istat)
   stop istat
endif

I suspect that it’s a problem with “neighbours(i)”. You might be going out of bounds (are you checking that “i” is a valid index?), or it’s values aren’t getting updated on the device, hence contains garbage.

If you can post a reproducing example, that might help with spotting the error.

-Mat

srikanthcs05 · January 25, 2019, 3:39am

Hi Mat,

Thanks for the prompt reply. I got a too many resources requested for launch error. I had to reduce to the number of threads per block, now it is working.

Thanks for all the help and support, you guys are amazing.

MatColgrove · January 25, 2019, 8:25pm

You’re very welcome and thanks for the compliment!

I’m not really sure why commenting out that one loop work around this issue, but glad you were able to determine the actual cause of the error.

Topic		Replies	Views
For-Loop is not executed CUDA Programming and Performance	5	1357	December 5, 2012
While loop not executed inside kernel Legacy PGI Compilers	6	5883	January 22, 2020
The output is wrong! it seems gpu doesnt do the work Legacy PGI Compilers	3	1516	October 31, 2018
Contents of loop failing to translate/compile/run? nvc, nvc++ and nvfortran cuda	25	949	February 11, 2023
kernel failed after few invokation CUDA Programming and Performance	9	7897	October 30, 2010
CUDA program issue, for loop CUDA Programming and Performance cuda	10	195	September 4, 2024
Cuda fortran doesnt launch subroutines containing gpu code Legacy PGI Compilers	3	2444	May 26, 2018
CUDA fortran manual loop index with function call Legacy PGI Compilers	1	1706	November 18, 2010
Loop isn't executing inside the kernel CUDA on Windows Subsystem for Linux	0	456	September 9, 2022
[Help] For-loop freezes Computer A for loop inside a global function freezes all my computer. CUDA Programming and Performance	0	3098	August 10, 2011

Running a loop in a kernal(CUDA Fortran)

Related topics