Odd problem with CUDA nested loop seems to not work

I have noticed a topic posted by coderart.

And maybe i have the same problem with him.

The problem with me is that nested loop with unknown loop times within a kernel seems not work.

but sometimes it works if the loop times is known to the kernel.

But I do not know how to fix it.

Any help would be apprecaite.

I suspect that there is a coding error somewhere. What I suggest you do is within the kernel, write out the number of iterations the for loop must cycle for each thread and then copy it back to the host and compare notes. If you don’t encounter a crash or driver reboot, then its likely your kernel is doing nothing since the watchdog timer isn’t timing out. You may find that your iteration limit is being evaluated to 0 or one of your threads is messing with you around. Be aware that the compiler will unroll loops to which are deterministic in size i.e. for(int i = 0; i < 10; i++) will iterate 10 times irrespective of runtime conditions - this topic is talked about in the programming guide and the compiler manual.

Thank you for your help.

And I notice a fact that:

In the kernel, there is a triple nested loop,

If i change the form of the triple nested loop in the kernel to the form a single loop as the following psuedo-code shows,

the CUDA seems to work right.

I guess the problem is CUDA will first estimate the times of the loop,

but a triple loop is hard to estimate. and the depth in the following code segment is not a constant but a variable

[codebox]

for(x=1;x<=depth;x++)

    for(y=1;y<=depth;y++)

         for(z=1;z<=depth;z++)

[/codebox]

=====>

[codebox]

for(i=1;i<=depthdepthdepth;i++)

    x=... y=... z=....

[/codebox]

Thank you for your help.

And I notice a fact that:

In the kernel, there is a triple nested loop,

If i change the form of the triple nested loop in the kernel to the form a single loop as the following psuedo-code shows,

the CUDA seems to work right.

I guess the problem is CUDA will first estimate the times of the loop,

but a triple loop is hard to estimate. and the depth in the following code segment is not a constant but a variable

[codebox]

for(x=1;x<=depth;x++)

    for(y=1;y<=depth;y++)

         for(z=1;z<=depth;z++)

[/codebox]

=====>

[codebox]

for(i=1;i<=depthdepthdepth;i++)

    x=... y=... z=....

[/codebox]