data transfer issue , Store values at GPU for next iteration

I have a code which calls the kernel 10000 times and each time cudamemcpy has to be called right before and after calling kernel. Obviously this takes a lot of time .
Is there any way I can store my data at GPU card for next iteration and use cudamemcpy only at start and end of the 10000 iteration.

Are you working on the same data in the next iteration that came out from your previous iteration? Then there is no need to do the mem copies. Just pass the same pointers again.

If you work on a different dataset (that does not depend on the previous iteration) then you can copy all the data in the beginning or use streams to overlap memcopies and computation.

If you need a more specific answer, please give more details of your problem.