Recently I was granted access to server with K20. When I am running my program on K20 I get different functions return code=4(cudaErrorLaunchFailure) error:
CUDA error at twig.cu:127 code=4(cudaErrorLaunchFailure) "cudaMemcpy(host_match_out, device_match_out, size_configs * sizeof(int), cudaMemcpyDeviceToHost)"
CUDA error at twig.cu:128 code=4(cudaErrorLaunchFailure) "cudaEventRecord(output_timer_stop, 0)"
CUDA error at twig.cu:129 code=4(cudaErrorLaunchFailure) "cudaEventRecord(cuda_timer_stop, 0)"
CUDA error at twig.cu:143 code=4(cudaErrorLaunchFailure) "cudaEventSynchronize(cuda_timer_stop)"
CUDA error at twig.cu:144 code=4(cudaErrorLaunchFailure) "cudaEventSynchronize(execution_timer_stop)"
CUDA error at twig.cu:145 code=4(cudaErrorLaunchFailure) "cudaEventSynchronize(input_timer_stop)"
CUDA error at twig.cu:146 code=4(cudaErrorLaunchFailure) "cudaEventSynchronize(output_timer_stop)"
CUDA error at twig.cu:149 code=4(cudaErrorLaunchFailure) "cudaEventElapsedTime(&elapsedTotalTime, cuda_timer_start, cuda_timer_stop)"
However I do not receive these errors on Tesla C2075. They also disappear if I compile program with -g -G flags (generate debug information).
What could be a possible source of those mistakes?