I have two 8800GTX. If I activate or not the SLI, I have roughly the same performance for my CUDA program and this performance is the same as an other machine with only 1Geforce.
My CUDA program is using a Monte-Carlo (a derivative of your project), so the calculation is massively parallel.
If I take for example your Mersenne twister project I get identical results
with SLI:RandomGPUtime: 8.49; BoxMullerGPUTime: 4.45
without SLI:RandomGPUtime: 8.55; BoxMullerGPUTime: 4.57
Geforce 1: IRQ: 0 PCI ExpressX16
Geforce 1: IRQ: 16 PCI ExpressX8
I have installed sdk/toolkit version 1.1 + driver
I compile with VC++ 2005 lignt edition
Thansk for your help .