_rtContextLaunch2D returned (702): Launch timeout)

Running an application using OptiX, I get the following error message when calling context->launch:

OptiX Error: 'Unknown error (Details: Function "RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (702): Launch timeout
================================================================================
Backtrace:
        (0) () +0x711f17
        (1) () +0x7102cb
        (2) () +0x30fe51
        (3) () +0x5bc0c3
        (4) () +0x5bce94
        (5) () +0x1c81bf
        (6) () +0x1c8946
        (7) () +0x1c9557
        (8) () +0x17a69b
        (9) rtContextLaunch2D() +0x2b9
        (10) runSimulation(int, char**, bool) +0x1c5
        (11) __libc_start_main() +0xf0
        (12) _start() +0x29

================================================================================
)'

What is the problem here and how can I fix it?

System information:

Linux 4.4.0-141-generic #167-Ubuntu SMP
NVIDIA-SMI 390.87
Device 0: "GeForce GTX TITAN X"
  CUDA Driver Version / Runtime Version          9.1 / 9.1
  CUDA Capability Major/Minor version number:    5.2
NVIDIA-OptiX-SDK-5.1.1-linux64

A launch timeout means you tried to do too much work at once or maybe had an error in your code which resulted in an endless loop.

How many rays are you shooting in a single launch?
Does it work when reducing the workload?
How long does it take without the timeout then?

I don’t known what timeout limits exist under Linux. Under Windows there is 2 second kernel driver timeout in Windows Display Driver Mode (WDDM) and if that’s exceeded there will be a Timeout Detection and Recovery (TDR) triggered by the OS which will stop the driver and restart it.
(In the past like under Windows XP there was a 15 second timeout before a bluescreen.)

The standard way to overcome that would either be to do less work more often, like progressive algorithms, tiled rendering, separating additive calculations (e.g. per light), etc., or have a dedicated GPU for compute tasks which is not falling under any of the OS timeout limits, e.g. Tesla or Quadro Workstation boards in Tesla Compute Cluster (TCC) driver mode under Windows.
Search for “TDR” on this forum, that problem has been discussed before.