Hi Mat,
In order to describe the problem with further detail I posted part of my kernel code:
if(i_ortho.eq.0)then
do m = 1, m_blk(myproc)
i = threadidx%x + i_b(m) - 1
j = blockidx%x + j_b(m) - 1
k = blockidx%y + k_b(m) - 1
if(i .LE. i_e(m) .and. j .LE. j_e(m)
1 .and. k .LE. k_e(m))then
c Part 1
vec_out(i,j,k,m) = 0.0
vec_out(i,j,k,m) = ( ap_dev(19,i,j,k,m) * vec_in(i,j,k,m)
1 - ( ap_dev(3,i,j,k,m) * vec_in(i+1, j,k,m)
1 + ap_dev(4,i,j,k,m) * vec_in(i-1,j,k,m)
1 + ap_dev(1,i,j,k,m) * vec_in(i,j+1,k,m)
1 + ap_dev(2,i,j,k,m) * vec_in(i,j-1,k,m)
1 + ap_dev(5,i,j,k,m) * vec_in(i,j,k+1,m)
1 + ap_dev(6,i,j,k,m) * vec_in(i,j,k-1,m)
1 + ap_dev(7,i,j,k,m) * vec_in(i+1,j+1,k,m)
1 + ap_dev(8,i,j,k,m) * vec_in(i-1,j+1,k,m))
1 ) * sps_dev(i,j,k,m)
c Part 2
vec_out(i,j,k,m) = vec_out(i,j,k,m)
1 + ( ap_dev(9,i,j,k,m) * vec_in(i+1,j-1,k,m)
1 + ap_dev(10,i,j,k,m) * vec_in(i-1,j-1,k,m)
1 + ap_dev(11,i,j,k,m) * vec_in(i,j+1,k+1,m)
1 + ap_dev(12,i,j,k,m) * vec_in(i,j+1,k-1,m)
1 + ap_dev(13,i,j,k,m) * vec_in(i+1,j,k+1,m)
1 + ap_dev(14,i,j,k,m) * vec_in(i+1,j,k-1,m)
1 + ap_dev(15,i,j,k,m) * vec_in(i,j-1,k+1,m)
1 + ap_dev(16,i,j,k,m) * vec_in(i,j-1,k-1,m)
1 + ap_dev(17,i,j,k,m) * vec_in(i-1,j,k+1,m)
1 + ap_dev(18,i,j,k,m) * vec_in(i-1,j,k-1,m)
1 ) * sps_dev(i,j,k,m)
end if
end do
elseif(i_ortho.eq.1)then
... etc
The output from ptxinfo is:
ptxas info : Used 41 registers, 100+16 bytes smem, 496 bytes cmem[0], 4 bytes cmem[1]
The interesting part about this code is that the problem I am solving does not actually enter the first if-statement (In other words, i_ortho = 1). Even though I am not actually entering the first if-statement during execution, the program will not run because of the error: Too many resources requested for launch.
The only way I found to get around this issue is to comment out part two of the first if-statement. When I do comment part two out ptxinfo is:
ptxas info : Used 32 registers, 100+16 bytes smem, 496 bytes cmem[0], 4 bytes cmem[1]
And the code executes perfectly. However, in cases were i_ortho = 0 the calculation will be incomplete.
Do you have any suggestions as to what to do?
Thankfully,
-Chris