With reference to CUDA 2.3: Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics.
Does that mean my applications which were written on a single GPU will run faster on a system with two GPUs in SLI mode, without modifying the kernel execution and any other change in the source?
No he means that before, if you asked how many GPUs there were in a system with SLI turned on, you would only get the first one. CUDA kernels never execute on more than one GPU. You have to start two host threads and bind each one to a different GPU, and then each host thread can start kernels on their associated device.