Quadro K2200 for some tests. While I created a 209715200 sized vector and added one on each element, the results was not what i expected. Meanwhile, there were none warning or error in execution. After reduce the vector size, the codes do work. May there be some limit of the operated vector size?
Probably you are not checking properly for CUDA errors at runtime.
In the failing case, run your code with cuda-memcheck.
There are no limits on vector size other than what can fit in memory.