How to run SFU together with FMA?

Hello everyone! Given that the special function units (SFU) and multiply/add ALUs are different physical units, and given that the control can schedule at least 2 warps per clock cycle (Fermi and up), then the GPU must be able to run both units at the same time.

However it seems to me that much luck is required to get 2 warps ready to get schedules, 1 needing to do FMA ops and the other pure SFU. That said, there has to be a way to force this coincidence to happen, otherwise special function will are basically “serialized” with the rest of instructions. How can one detect that the SFU is/was busy while the other ALU were also busy?

Can anyone show a simple example where the dual warp scheduling capability is exploited? I have tried with no luck… thanks.

"1 needing to do FMA ops "
not an “operations” but “operation”, so if one warp issues FMA instruction other can do SFU.

So any examples that force all compute units to work, ALUs and SFUs?