How to avoid repeating memory allocation and data copy

Hello all,
I am new to CUDA programming. I am doing some computation mixing C++ and CUDA C. In the wrapper, I allocate memory for device pointer arrays, copy data from CPU to GPU, and call kernel to do calculation, then copy results back to CPU. I need to call wrapper function for many times, and there are several CPU pointer arrays that I only need to copy once to GPU and no need to refresh every time when I call the wrapper. The way I do now is to do memory allocation and data copy every time I call the wrapper, it is quite inefficient. I am thinking how can I do the following things to make program more efficient:
(1) allocate the memory before call the wrapper, so avoid repeating memory allocation
(2) or allocate memory the first time when call wrapper, then keep the memory for the variable, and free at the very end
(3) for the stationary CPU pointer arrays, copy to GPU, and keep on GPU for later calculation when calling the wrapper
I know some of you guys must know how to avoid the multiple time memory allocation and data copy. Can you please help show how to do it, it will be so NICE if you can give some examples? I will really appreciate it! Thanks!