whats the best way to copy every other words or every other few words using cudamemcpy?
Use cudaMemcpy2D(). Conceptually the stride becomes the row width of a tall skinny 2D matrix. Be aware that the performance of such strided copies can be significantly lower than large contiguous copies. For a worked example, you might want to refer to this Stackoverflow answer of mine:
[url]cuda - Copying data to "cufftComplex" data struct? - Stack Overflow