confusing the result of running a kernel

I had developed a kernel function that sort an array. I transferred the array to global memory , with array size maximum of 810241024 and compiles and run correctly using (8*1024+9) blocks with 1024 thread per block

if i increased the size to 1610241024 and upper, the program compiles and run, but the array is not sorted (may the kernel not work).

can any one explain what happened and suggest the solution?
I’m using Geforce Gt 740 m GPU

you possibly exceeded the maximum grid X or Y dimensions

this is the maximum GRID_DIM_X

I think my array size is less than maximum GRID_DIM_X

Are you doing proper cuda error checking?

what I shall do for error checking?

google this:

proper cuda error checking

click on the first hit.

thank you all.
I will try