unspecified launch failure

Hi,

I am getting the following error when running a coe:

 
./spher 
    80.85433959960938        0.4611731770210439     
    89.73984527587891        0.4757429858514819     
 REF_NLAY4_KERNEL:            4
 unspecified launch failure                                                                                                      
 REF_SUM_KERNEL:
 unspecified launch failure                                                                                                      
 REF_PROD_KERNEL:
 unspecified launch failure                                                                                                      
0: copyout Memcpy (host=0x1362060, dev=0x200300000, size=65536) FAILED: 4(unspecified launch failure)

The first two output lines indicate that the code ran and gave correct results. I just put a loop around the same subroutine call and call it over and over again. Sometimes, without apparent reason, the code crashes. As above after two successful calls. Sometimes, it may run hundreds of times successfully before crashing.

When I compile the code in emulation mode, it appears to run fine for thousands of calls.

Any ideas why this might happen? It seems as if crashes occur more frequently when array sizes in the computation are large.

Any insight would be greatly appreciated.

Thanks, Jan

Hi Jan,

Most likely you’re kernels are getting memory access errors. Check for out-of-bounds errors or uninitialized memory reads/writes. Host code is much more forgiving when accessing out-of-bounds memory while device code will die in the same circumstance.

Sans debugger support (where working on it!), I will start commenting out portions of code in the kernel and/or use print statements to start narrowing the problem code. Also, I’ll sometimes use temp arrays to hold intermediate values.

  • Mat