the launch timed out and was terminated strange error on cudamemcpy

Hi guys, I’m working on my thesis, a parallel Artificial Neural Network implementation. Basically i read a file with the problem to solve as input, and all goes fine, until i have to use an immense input file.
My code starts to give this error “execution failed : (6) the launch timed out and was terminated.” The problem is that the error line is a cudamemcpy!! not a kernel launch!

cudaMalloc( (void**) &m_dLayersSize,m_numOfLayerssizeof(uint32_t ));
optimizerCudaCheckError(“initSolutionSet: cudaMalloc() execution failed\n”, FILE, LINE);
cudaMemcpy(m_dLayersSize, m_hLayersSize, m_numOfLayers
sizeof(uint32_t ), cudaMemcpyHostToDevice);
optimizerCudaCheckError(“initSolutionSet: cudaMemset() execution failed\n”, FILE, LINE); //this line gives the error

I checked the host variables and they’re ok! m_numOfLayers is 4 and the values in the host vector are 2500,300,100,27.
With other examples all goes fine… i’m a bit confused… i think that the error isn’t really true, but something that involve memory and thread launch complexity.

My card is a 560gtxTi (2.1 spec) and the cuda sdk version is 4.0

I can’t post my code since there are 3 libraries… something like 4000 lines

Thanks to all, Enrico

edit: if i run the program with the gdm shutted down, as i expected, the program goes a little further but then fills the screen with artifact and stopped… i’m so desperate

I have to solve a similar problem, to read a lot of images data into GPU, I have not figured out it, but the following link might give you some inspiration.

http://stackoverflow.com/questions/6185117/cudamemcpy-errorthe-launch-timed-out-and-was-terminated

The cudaMemcpy() is presumably the first synchronous operation after a preceeding kernel launch. I you don’t insert error checking directly after a kernel launch, asynchronously occuring errors, such as triggering the watch dog timer, will be reported on the next synchronous operation. I am pretty sure this behavior is described in the documentation.

The watchdog timer is a feature of all operating systems supported by CUDA that prevents a non-graphics kernel from blocking the GUI for undue periods of time. I think the limit is typically in the 2-5 second range (haven’t measured it recently). Triggering the watchdog timer means that the CUDA kernel ran too long. You can either change your setup to run without a GUI (e.g. run without X in Linux), or reduce the runtime of the kernel. A further option is to use two cards, a low-end one for displaying the graphical desktop and a high-end one for compute tasks. Make sure to exclude the second GPU from running the desktop. When a GPU is set aside for compute work you can run CUDA kernels for as long as you esire, as no watchdog timer comes into play.