(Disclaimer: I don’t have a Titan yet, so this is based on my research while evaluating the card.)
One thing to note is that the Titan is closer to the K20X, rather than the K20. The K20 has only 13 SMXs and a 320-bit memory bus, where as the K20X has 14 SMXs and a 384-bit memory bus, like Titan. Titan has higher core and memory clock rates, so it is faster than the K20X. According to Anandtech, if you enable full-speed double precision on the Titan in the driver, the core clock rate will be limited due to the larger thermal load. Based on statements in their review, even if the card underclocks itself, it should still be faster than the K20X at double precision as well.
I think the main limitation of the Titan for your scenario is the performance of the non-TCC driver in Windows. There have been many complaints about launch overhead and kernel launch latency with GeForce cards on Windows. The other major limitation is the number of enabled DMA engines on the card. A Tesla card can do simultaneous bi-directional transfer with its two DMA engines, whereas Titan only has one enabled.
A less concrete issue is the lack of support and testing the GeForce cards have for 24/7 production CUDA use. I have never worried about this since I use CUDA exclusively in a research context, but I could imagine how it might be an issue depending on your situation. However, I have no idea what kind of support you get with a Tesla purchase, and whether that is worth the extra 2*$2800 to you.