big loop and memory writing

here’s a simple code,which will cause a terrible runtime error

black screen, after that, thousands colorful dot on my screen

what casue the problem?

i call the kernel by

test<<<8,256>>>()

[codebox]

global void test()

{

int PartIdx;

PartIdx = blockIdx.x*blockDim.x + threadIdx.x;

shared float a[2048];

for (int i=0;i<100000000;i++){

a[PartIdx]=1.0f;

}

}

[/codebox]

and this code won’t get any error

i have really no idea what’s the problem is…

[codebox]

global void test()

{

int PartIdx;

PartIdx = blockIdx.x*blockDim.x + threadIdx.x;

shared float a[2048];

for (int i=0;i<100000000;i++){

a[0]=1.0f;

}

}

[/codebox]

I test your code on my computer

winxp pro64, vc2005, Tesla C1060, driver 190.38, cuda2.3

your setting ( test<<<8,256>>>() ) requires 9 s, program works fine.

also I use one GPU of GTX295 to test your code, it costs 9.5 s and works fine.

I think that this is watchdog problem

could you use deviceQuery in SDK example to check your configuration?

find some field “Run time limit on kernels:”

if this field is yes, then you have watchdog problem.

thank you, i’ll try it next Monday

If you’re using Vista, I think they have a fixed limit on the maximum runtime of kernels / graphics driver calls, not only for CUDA but DirectX etc. also.
[url=“http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx”]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]

Maybe the first code couldn’t be optimized that much, and so didn’t stay inside the time limit. (Does the blank screen occur only after a few seconds? And how long does the first version run?)

thank you for reply

i’m using windows 7, enterprise version

I just wanna understand what the problem is, so I may find a different way to organize my algorithm.

the blank screen occurs only after a few seconds, yeah

the first version runs for a few seconds before blank screen occurs. I also try it under linux, it runs ok and takes about 10s, so I think it could be the watchdog problem

strange things is

devicequery returns this:
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes

does this mean i should not have a run time limit??