It’s unclear to me what you mean when you refer to “dGPU’s buffers” and “iGPU’s buffers.” All of the buffers in this swap chain originate from the dGPU driver, although the buffers that the iGPU flips between for PRIME Synchronization are allocated in system memory in a format that the iGPU can understand.
There is an intermediate composition from the X screen’s primary surface into an intermediate video memory buffer (similarly to ForceFullCompositionPipeline = On) before asynchronously copying from that into the requested system memory back buffer, but that won’t add any additional latency because it all completes before the iGPU’s next vblank. This is done for performance reasons, as the composition step is done in the 3D channel, and we don’t want 3D applications to be blocked behind a relatively slow copy into system memory as they would if we composited directly into the system memory back buffer. Fermi GPUs lack this asynchronous copy support, so they composite directly into system memory at the expense of 3D performance for lack of a better option.
One issue may arise from the fact that OpenGL swaps after each composition, potentially adding more latency into the swap chain.
If you want to minimize input lag as much as possible today, your best bet is to set __GL_SYNC_TO_VBLANK=0. Naturally, the application won’t sync to vblank, but due to an implementation detail in PRIME, will not tear. Composition is essentially atomic, and incomplete frames will drop rather than tear. Under PRIME, GL Sync to VBlank has more to do with throttling the application to vblank while maintaining smoothness than it does with avoiding tearing.
Disabling GL Sync to VBlank should eliminate the potential additional input lag from the GL->PRIME->iGPU swapchain while maintaining tear-free, at the expense of power at framerates much higher than the refresh rate, or at the expense of smoothness at framerates closer to the refresh rate.