 # Help for a simple testing problem

I just did the simple test with GPU, which I only used very simple operation as follows:
for (i = 0; i < loops; ++i)
for (j = 0; j < loops; ++j)
for (k = 0; k < loops; ++k)
for (l = 0; l < loops; ++l)
{
y = mn;
y = m-n;
y = m+n;
y = m
n;
y= m*n;
}

where loops is 1000,000.

When I use threadsblocks=256300, the program crashed. I wm wondering with you linux system and 0.9+, does this happen? Do you think 256300 is too many? According
to their userguide, it should be a very small number. Thanks a lot!
I guess if there are too many operations in each thread, there are some problems in the memory management. So when the number of BLOCKS

!!! With 4 nested loops, you are calculating the inner part of the loop 1e24 times. Assuming that the inner loop can be done 100e9 times per second (which it probably isn’t even that fast), the calculation will take 10^13 seconds! That’s 300,000 years!

If you are running X, there is a ~5 second limit for a single kernel execution. Read the many threads on the forum about this if you want the gory details.

Thanks! Got it!

My problem is that I need lots of caculations, then it might not suitable to this.

No, CUDA is very good at doing lots of calculations. I would guess however that no matter what calculations you are doing they do not involve four nested for loops each to run 1 million times. Think about the code you have written above.

The innermost loop runs 1 million times,

then the 2nd loop increments by 1 and the innermost loop runs a million times.

~1million * 1million times later the third loop increments by 1

~1 million * 1million * 1million times later the fourth loop increments by one.

~1 million * 1 million * 1million * 1 million times later the program finishes.

That is the first problem with your code.

A good compiler however (i dont know if CUDA would have done this if you had asked it to perform optimization) would have simplified your code to the following:

``````{

y = m*n;

y = m-n;

y = m+n;

y = m*n;

y= m*n;

}
``````

Seen as you do not actually use the values i,j,k and l anywhere within the loops.

Chris