The TCC driver is exclusive to Windows because it (or something equivalent) is not needed on other operating systems supported by CUDA.
Up to and including Windows XP, Windows provided a driver model with low overhead for graphics devices. The downside was that it made it all too easy for graphics drivers to crash the operating system. So with Windows 7, Microsoft introduced a new driver model, WDDM 1.x, that gave the operating system a large amount of control over, and isolation from, graphics devices.
For example, with WDDM a graphics driver needs to allocate memory on the GPU through the operating system facilities. This approach created massive overhead. The NVIDIA drivers try to mitigate this overhead as much as possible, for example by batching kernel launches in CUDA. While this helps overall performance, it can also create performance artifacts.
So NVIDIA came up with an alternative driver, the TCC driver, that tells the operating system to treat the GPU as a 3D Controller, not a graphics device, thus incapable of supporting the GUI. This provides a low-overhead driver environment that is competitive with the Linux driver environment performance wise. As you have found, and I have heard anecdotally from other people, it may even be a tad faster.
With Windows 10, Microsoft grabbed control of graphics devices even harder with a new driver model variant WDDM 2.x. One widely observed side-effect of this is that it is not possible for CUDA programs to allocate more than about 81-82% of the GPU’s physical memory.