Inconsistent results watchdog issues?

All,

I had developed a kernel that would do option pricing using trinomial-tree (its ok not to know what this means).

It is enough to know that my program has 2 inputs.
NUM_OPTIONS and NUM_STEPS.

NUM_STEPS determines the size/time complexity of the problem

NUM_OPTIONS tells how many such options need to be solved.

I spawn “NUM_OPTIONS” amount of blocks and each block evaluate the corresponding option.

Now, I find that sometimes my program behaves in-consistently for certain option sizes.

For example,

  1. With 100 as NUM_OPTIONS and 16,000 as NUM_STEPS, my program works fine
    everytime I execute.

  2. With 200 as NUM_OPTIONS and 16,000 as NUM_STEPS, my program works
    in-consistently. Sometimes it works and sometimes it just returns 0 after what
    seems like a hang.

All the in-consistent case executes for more than 8 seconds… Sometimes, I get correct results and sometimes not. Sometimes, Iget correct results even after 11 or 12 seconds. I am running under Windows XP.

All option-evaluation done within 5 seconds are stable and consistent.

Has this got something to do with the watchdog ? Kindly enlighten me.

NOTE:

I have verified that my GPU-memory allocations are fine and there are no errors.

The only way to know if you are hitting the watchdog is to check for errors after the kernel call.

kernel<<<grid,threads>>>();

// check for errors, should only be done in debug builds for performance reasons

cudaThreadSynchronize();

cudaError_t error = cudaGetLastError();

if (error != cudaSuccess)

   print error message decoded with cudaGetErrorString (IIRC, name of method might be different

If you are reaching the watchdog timeout, you should get the error message “launch timeout” or something similar. You can test this by putting an infinite loop in kernel().

Thanks for the info Mr.Anderson. I did not know such cute functions exist. Thanks.

Best Regards,
Sarnath

Hooo…Aaah… You were right Mr. Anderson… It was the watchdog timeout.

cudaGetLastError returned a value of 6 – meaning Launch Timeout - I found the enum in “driver_types.h” in the CUDA toolkit include directory.

Thats a sigh of relief for me. Now, I need NOT debug those 200 odd lines of CUDA code…

Is there a way to shut this watchdog timer off in WIndows XP? (like some registry entry or sthg ???)

FAQ #33: http://forums.nvidia.com/index.php?showtopic=36286

Right. But there is another Forum thread where people have encountered problems even running on a secondary adapter :-)

Please see Tim’s post in GPU computing discussion under the topic “Tesla and Watchdog”.

http://forums.nvidia.com/index.php?showtopic=62434 – tesla and watchdog.

Appreciate, if you could comment on the “topic” above on “Tesla and watchdog”.

http://forums.nvidia.com/index.php?showtop…98&#entry176998

– This is where the problem of secondary graphics adapter is discussed.