wrong results when changing kernel size

I am using CUDA in FORTRAN. When the kernel size in a block set a little big, I get a totally wrong result. The detail kernel size are as follows:

tBlock=dim3(32,8,1)
grid=dim3(ceiling(real(kdim-1)/tBlock%x),ceiling(real(idim-1)/tBlock%y),1)
call kernel<<<grid,tBlock>>>(...)

When the tBlock size changed to “dim3(32,16,1)”, the result was wrong. However, when the tBlock size changed to “dim3(32,8,1)” or “dim3(32,2,1)” or “dim3(32,1,1)” or “dim3(1,1,1)”, they all can get right results.
Besides, there are about 40 automatic variables in the kernel.
What may caused the mistake?

Hi xll_bit,

Are you checking the error code after executing the kernel?

It’s possible that you’re encountering some resource limit such as using too much heap (due to the automatics) and the kernel is failing.

It could also be a program error in your kernel where somehow when the grid’s y dimension is 16 it causes an issue.

-Mat

Hi Mat,
How do I know that I have encountered the kernel failing mistake? when the kernel is fails, can it continue run without any prompting?

can it continue run without any prompting?

Yep. If a kernel fails, the host code wont know this unless you check if the kernel returned an error condition.

How do I know that I have encountered the kernel failing mistake?

From: https://devblogs.nvidia.com/how-query-device-properties-and-handle-errors-cuda-fortran/

Add the following bit of code after your kernel launch:

call saxpy<<>>(x_d, y_d, a)
ierrSync = cudaGetLastError()
ierrAsync = cudaDeviceSynchronize()
if (ierrSync /= cudaSuccess) write(*,*) &
  ’Sync kernel error:’, cudaGetErrorString(ierrSync)
if (ierrAsync /= cudaSuccess) write(*,*) &
  ’Async kernel error:’, cudaGetErrorString(ierrAsync)

or something like:

call kernel<<...>>()
istat = cudaGetLastError()
if (istat .ne. 0) then
    print *, "Kernel error: ",  cudaGetErrorString(istat)
    stop istat
endif

-Mat