Optix 6.5 - Multi-GPU

I want to translate my Optix Application that is currently running on version 6.5 into a Multi-GPU setup.
From the documentation I did infer that by default it does use all of the available GPUs. Which it does, but I am having a slower performance than a single GPU setup. With regards to that I have some questions, I have quite a number of buffers marked as RT_BUFFER_OUTPUT (roughly 10 or so). What happens to these buffers in a multi gpu setup ? Is there a copy of each of them in all the gpus and there is a sync step that happens after the computation is done ? Or all the buffers reside on the host and the data is computed and transferred via PCIe ? Does the same happen for RT_BUFFER_INPUT ?

Please have a look into the following threads about multi-GPU topics on OptiX 6 and earlier:
Look for “pinned memory” and RT_BUFFER_GPU_LOCAL inside these explanations.

There are also topics inside the OptiX 6.5.0 programming guide touching multi-GPU:

That said, with OptiX 7 you would have explicit control about any multi-GPU behavior because OptiX 7 itself knows nothing about multiple devices. That part is completely handled by the CUDA host code you control!

The OptiX 7 applications linked here contain one example which shows different methods to distribute the rendering workload of one frame over multiple GPUs: