CUDA kernel not running Kernels on windows XP

Mesher · September 22, 2008, 8:54am

I am running CUDA on windows XP with a GEFORCE 9800.
I use visual c++ 2005.
My kernel is running inside a loop.
If I give it a search window parameter of 5, then it works.
However, if I give it a search parameter of 50, it doesn’t have data in the result memory(it become black).
If I get the kernel outside of the loop, and have just one iteration, but with 50 in the parameter, then it works.
So I thought that I am doing too much work, and CUDA just aborts it.
I thought it might be something with the 5 seconds limit, but I am not sure it reach 5 seconds.
However, the loop is running in CPU(isn’t it suppose to be to kernels and not code running on the CPU?)
If it is the 5 seconds limit, then I need to select on which card cuda will run, and have one card without screens attached to it?

Thank you.

Mesher · September 24, 2008, 1:12pm

Maybe I am missing something, but I believe there are some limitations on the kernel which I am not aware of.
The only limitations I know of, is provding the kernel function with a thread block under 512 threads, and avoid memory leaks.
However, it seems the kernel doesn’t run when done certain calculations, and I don’t know why.
Perhaps the kernel has a limitation of how much memory it can read from?
For instance, doing:
Result[i] = a[i];
or
Result[i] = b[i];
Will work.
But doing
Result[i] = a[i]+b[i];
Will not work.
(Its a simplified example)
So I don’t have any idea why my kernels don’t run, what is the cause, and how can I debug it.

Any help will be appreaciated.

MisterAnderson42 · September 24, 2008, 5:05pm

Standard debugging practice is to check for errors after every kernel launch.

Given your simplified example, I’m assuming that you are requesting too many resources for the launch: the 2nd simplified example will use more registers than the first and you are probably exhausing the available registers if you request 512 threads per block. num_thread_per_block * register_usage must be less than 8192 (16384 on G200).

Mesher · October 6, 2008, 10:25am

I have added CUT_CHECK_ERROR(“Kernel execution failed”); after each kernel, and it doesn’t do anything. Although I do get a 0 values in the result memory.
I also checked how many registers my kernel use, and it use 20 registers, so I tested it with a block size of 25, and it still give me 0 values.
The thing is, the kernel gives result if I call it once (with a gride size of several blocks and 25 threads in a block)
But if I call the same kernal several times inside a for loop, then it does not give results.
I don’t know why it doesn’t work.

Edit: The problem is even more severe than I thought.
I run a certain code, with some kernels running on release mode.
I view the result and there are results.
Then I run the same code, without even recompiling or anything like that.
Just run it, and I get a black image result. So the calculations failed.
Why is it so inconsistent?

Mesher · October 7, 2008, 1:37pm

I have made some progress, I guess, but still encounter “weird” problems.

It is possible that my previous problem was due to the fact that I didn’t free all of the CUDA memory I allocated.

Now I have a different, yet similar problem.

I run some CUDA kernal inside a double loop.

If I make the loops like this:

for (int i=0; i<1; i++)

    for (int j=0; j<1; j++)

        { do kernal }

or like this

for (int i=1; i<2; i++)

    for (int j=0; j<1; j++)

        { do kernal }

Then it works.

If I set the loops to be like this:

for (int i=0; i<2; i++)

    for (int j=0; j<1; j++)

        { do kernal }

Then I get the CUDA error “unspecified launch failure”

Why would it work when I run the kernel for i==0 and i==1 but not when I run it for both one after the other?

MisterAnderson42 · October 7, 2008, 5:45pm

Check for any out of bounds memory accesses. Your seemingly random problems could be explained by them and they are the most common cause of “unspecified launch failure”.

If you are running on linux, you can compile in emulation mode and run your app through valgrind to find where the out of bounds memory accesses are occuring.

Topic		Replies	Views
No results when running on a devices CUDA Programming and Performance	3	2526	November 19, 2008
unspecified launch failure kernel fails if a loop is too long CUDA Programming and Performance	8	42839	April 25, 2007
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1778	November 4, 2016
CUDA kernels keep on crashing CUDA Programming and Performance	6	3644	October 27, 2008
Kernel functions do not seem to run CUDA Programming and Performance	2	3733	April 18, 2010
CUDA kernel timeout CUDA Programming and Performance	12	58727	December 22, 2022
Different runs using same parameters produce different results CUDA Programming and Performance cuda	5	628	October 12, 2021
Kernel is not launching in my code. What is the problem CUDA Programming and Performance	3	420	November 10, 2021
CUDA Timeout? CUDA Programming and Performance	7	27685	December 19, 2011
Odd problem with CUDA nested loop seems to not work CUDA Programming and Performance	3	11635	January 20, 2009

CUDA kernel not running Kernels on windows XP

Related topics