Problem with CudaMallocPitch and CudaMemcpy2D

I’m posting problem-code back down:

[codebox]

int *neuronxy_cuda;

size_t pitch_neuronxy_cuda; //pitch è la larghezza in byte per ogni riga

cudaMallocPitch((void**)&neuronxy_cuda, &pitch_neuronxy_cuda,2 * sizeof(int), 40);

set_neuron_cordinate_cuda<<<1,40>>>(neuronxy_cuda, neurondisplay, ind->nneurons, ind->ninputs, nhiddens, ind->noutputs, pitch_neuronxy_cuda);

cudaMemcpy2D(&neuronxy, 2 * sizeof(int), neuronxy_cuda, pitch_neuronxy_cuda, 2 * sizeof(int), 40, cudaMemcpyDeviceToHost);

[/codebox]

Now, the problem is about behavior difference in emulation mode and not emulation mode.

In emulation mode “neuronxy_cuda” is full of correct value and then, “cudaMemcpy2D” copy this in neuronxy and Error type (cudaError) is “cudaSuccess”

In not emulation mode “neuronxy_cuda” don’t contain correct value and then, “cudaMemcpy2D” don’t copy this (error value) in neuronxy and Error type (cudaError) is “cudaErrorLaunchFailure”

WHY?

“cudaErrorLaunchFailure” means that your kernel didn’t run successfully.
I read your posted code, nothing wrong at all.
So I wonder about your kernel code, if it is possible, can you post a piece of kernel code?