Please have a look into this chapter of the OptiX 6.5.0 Programming Guide which explains the only way to have asynchronous launches in OptiX 6 versions:
https://raytracing-docs.nvidia.com/optix6/guide_6_5/index.html#post_processing_framework#asynchronous-launches
Any other launch mechanism is synchronous.
(Correction) The cuMemcpy2D is most likely an update of an internal data structure which happens when you change anything on the scene or rtVariables between launches.
Because of that it’s recommended to put variables you need to change regularly between launches into an input buffer and update its contents.
https://forums.developer.nvidia.com/t/recompile-question/71937/2
That doesn’t happen in OptiX 7 because there updating any data is your responsibility and done with CUDA API calls.
For complete control over GPU parallelism like that, you would need to use OptiX 7 which is using native CUDA for management of devices, contexts, and streams.
optixLaunch() calls in there are asynchronous and take a stream as argument. You decide what is happening asynchronously and when.