effcient cudamemcpy strided

whats the best way to copy every other words or every other few words using cudamemcpy?

Use cudaMemcpy2D(). Conceptually the stride becomes the row width of a tall skinny 2D matrix. Be aware that the performance of such strided copies can be significantly lower than large contiguous copies. For a worked example, you might want to refer to this Stackoverflow answer of mine:

https://stackoverflow.com/questions/13535182/copying-data-to-cufftcomplex-data-struct