"Concurrent copy and execution" in OpenCL How to?

Hi,

can someone give me a hint on how (if possible at all) to implement “Concurrent copy and execution” using OpenCL.

I’m losing too much precious computing time waiting for up-/downloads to/from the GPU.

Thanks,
mark