cuda launch failed

Can someone please remind me of possible causes where cuCtxSynchronize() called after the cuLaunchGrid() would return CUDA_ERROR_LAUNCH_FAILED?
.
.
.
EDIT: For those who want to know one possible cause of this error, accessing addresses you shouldn’t causes this error. I had some hard-coded values for unrolling loops but changed the size of the input data to a smaller array, which is a bad idea.