Strange TCC mode behavior

I have a 64 bit Windows 10 computer with one GTX 760 GPU (recognized by nvidia-smi) as id=0 and one Titan X as id=1. Under WDDM, my simple CUDA 7.5 program with an infinite-loop kernel times out and crashes after a couple of seconds as expected. After running nvidia-smi -dm 1 and rebooting, the program runs without timing out, as expected. Somewhat surprisingly, display is fully responsive despite the fact that cudaGetDevice returns 0 - the id of the GTX 760 used for display. So I add a cudaSetDevice(1) at the top of my program to make it run on the Titan X, but then my app crashes after 3 or 4 seconds - totally unexpected.

Could someone please help me understand this strange behavior and figure out how to run my app on the Titan X?

CUDA enumerates the GPUs in an order that may not match nvidia-smi

It usually enumerates the most powerful GPU first.

So when you had no selection or cudaSetDevice(0), you were actually running on the Titan X.

When you did cudaSetDevice(1), you started running on the GTX 760.

run the deviceQuery sample code to see the order that CUDA enumerates your devices in