Kernel execution unknown error when number of blocks > 302

i get this error when executing a kernel with following ptxas info with 256 threads/block and more than 302 block, less than 302 block works without the error
ptxas info : Used 32 registers, 80+80 bytes smem, 48 bytes cmem[1]
cutilCheckMsg cudaThreadSynchronize error: Kernel execution failed in file <template.cu>, line 193 : unknown error
the kernel is 30 lines and it calls 13 device functions with approximately 286 lines

–edit
i’m sorry i forgot to tell you i have elitegroup geforce 9800 gt compute capability 1.1
dual core 3.2 ghz
2gb ram
i hope that helps
thanks for advance

i tried 181.20, 182.08, and 182.50 drivers
i also tried on laptop with 8600M GS, with drivers 181.22 beta drivers
same error, any help is really appreciated.

Your kernel requires 32 registers per thread, so maximum block size is (8192/32)=256 threads per block.

Whats the execution time of the kernel? Are you hitting the watchdog timer (5sec)?

thanks AndreiB and Jeroen for your replies
for Jeroen:
i think the kernel never runs i tried this

dim3 grid(303,1,1);//302 runs fine
dim3 threads(256,1,1);
system(“PAUSE”);
Kernel<<<grid, threads>>>(psx, psy, stepx, stepy, pex - psx, pz, pnmesh, doffsets, bmp, pnumberx, pstartx, pstarty);

when system(“PAUSE”) i press any button, the error message appears immediately in apparently to me no time

for AndreiB i already use 256 threads per block

Then check for memory accessess that are out of bounds; this is most likely cause of such behaviour.

I’m sorry for my late reply, but i just found out that i passed a host pointer to the kernel.

Thanks very much.