whats the best way to copy every other words or every other few words using cudamemcpy?
Use cudaMemcpy2D(). Conceptually the stride becomes the row width of a tall skinny 2D matrix. Be aware that the performance of such strided copies can be significantly lower than large contiguous copies. For a worked example, you might want to refer to this Stackoverflow answer of mine: