I observed a strange behavior of nvcc with my cellular automaton kernel.
The kernel runs in emulation mode without errors, but in device mode it depends on a statement and where I put it in the code.
This statement has no effect on the other code and the other way arround.
I get this error when it’s on the wrong position
>Cuda error: Kernel_11: too many resources requested for launch.
Here is the kernel … I modified the sobel exampel for my purposes, but the name of some variables stayed the same …
__global__ void
CA_Kernel_11( Pixel *pSobelOriginal, unsigned int *cost, unsigned int *cost_temp, unsigned int *bit_mask, unsigned int *bit_mask_write, unsigned int pitch,
int w, int h, float fScale )
{
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
__shared__ unsigned int tmp[16];
if(threadIdx.x == 0)
{
tmp[threadIdx.y] = bit_mask[y*(pitch/32)+(x/32)];
bit_mask_write[y*(pitch/32)+(x/32)] = 1<<threadIdx.x;
}
__syncthreads();
if((tmp[threadIdx.y] & (1 << threadIdx.x)) >> threadIdx.x == 1)
{
unsigned int up = (UINT_MAX65540);
unsigned int down = (UINT_MAX65540);
unsigned int left = (UINT_MAX65540);
unsigned int right = (UINT_MAX65540);
unsigned int min_val = (UINT_MAX65540);
// center node
unsigned int center = cost[y*pitch + x];
center = tex2D( tex_cost, x, y);
// > the kernel runs when the statement is here< 

//up = cost[(y1)*pitch + x];} 
up = tex2D( tex_cost, x+0, y1 ); 
left = tex2D( tex_cost, x1, y+0 ); 
right = tex2D( tex_cost, x+1, y+0 ); 
down = tex2D( tex_cost, x+0, y+1 ); 

this statement> if(threadIdx.x == 0){bit_mask_write[y*(pitch/32)+(x/32)] = 1<<threadIdx.x;}
// Weights
up += tex2D( tex, x+0, y1 ) << 8;
left += tex2D( tex, x1, y+0 ) << 8;
right += tex2D( tex, x+1, y+0 ) << 8;
down += tex2D( tex, x+0, y+1 ) << 8;
min_val = min(
min(
min(up & 0xffffff00, right & 0xffffff00),
min(down & 0xffffff00,left & 0xffffff00)), center & 0xffffff00);
if(min_val ==(center & 0xffffff00))
{
cost_temp[y*pitch + x] = center;
}
else if(min_val == (right & 0xffffff00))
{
cost_temp[y*pitch + x] = right;
}
else if(min_val == (down & 0xffffff00))
{
cost_temp[y*pitch + x] = down;
}
else if(min_val == (left & 0xffffff00))
{
cost_temp[y*pitch + x] = left;
}
else if(min_val == (up & 0xffffff00))
{
cost_temp[y*pitch + x] = up;
}
}
}
The statement makes no sense on both positions. I was looking for the problem and tried to copy it on different positions …
I start the kernel with:
dim3 dimBlock(32, 16, 1);
dim3 dimGrid(iw / dimBlock.x, ih / dimBlock.y, 1);
CA_Kernel_11<<<dimGrid, dimBlock>>>(odata, cost, cost_temp, bit_mask, bit_mask_2, iw, iw, ih, fScale );
I don’t really know where the problem ist and maybe I’m only too tired …
Best regards,
capjo