As a rough rule, can usually expect all Quadro cards of a generation equal or higher to Quadro 4000 (so e.g. for currrent ‘Turing’ generation the Quadro RTX 4000 / 5000 / 6000 / 8000) support TCC. That is at least our experience so far. I don’t think that a Quadro P400 supports TCC.
The decision which cards get TCC is I suppose more one of market segmentation (and testing, support & qualification) than of hardware limitations. The same applies for double-precision capabilities etc. Any company is of course free to segment the market as they see useful - e.g. Intel does the same with Core / Xeon.
It seems that Pytorch is not very optimized for the specifics of WDDM (avoid short kernel launches, avoid repeated memory allocations). Regarding YoloV3, it might be better to switch to the ‘darknet’ framework (https://github.com/pjreddie/darknet), which provides fast inference (uses CUDNN internally) also on windows.