I think it’s because of wddm. Of course I’m having difficulty finding information on which cards support tcc. Like if I go buy a cheap p400 will that support it? It pisses me off I can’t do it on my 1400 dollar 2080ti.
Does it though? Did you actually read anything you linked? The laptop one is just completely unrelated, then in the stackoverflow post they say Quadro cards, which is why I asked about the p400. But who knows?
Most companies publish a compatibility matrix for their products. I shouldn’t have to rely on random people on stack overflow.
I posted a legitimate a bug in this one and expressed my frustrations with continued anti-consumer practices. It’s not childish, there’s literally no other venue to express your frustrations at arbitrary decisions. The fact that people can flash custom firmwares in older cards and modify drivers with newer ones clearly show it’s not a hardware limitation.
“TCC mode should be available for Tesla GPUs, most Quadro desktop GPUs, and GeForce Titan family”
I’m sorry I don’t want to spend my money to test a bug theory on some guys forum post where he says it should work. How about some official documentation?
As a rough rule, can usually expect all Quadro cards of a generation equal or higher to Quadro 4000 (so e.g. for currrent ‘Turing’ generation the Quadro RTX 4000 / 5000 / 6000 / 8000) support TCC. That is at least our experience so far. I don’t think that a Quadro P400 supports TCC.
The decision which cards get TCC is I suppose more one of market segmentation (and testing, support & qualification) than of hardware limitations. The same applies for double-precision capabilities etc. Any company is of course free to segment the market as they see useful - e.g. Intel does the same with Core / Xeon.
It seems that Pytorch is not very optimized for the specifics of WDDM (avoid short kernel launches, avoid repeated memory allocations). Regarding YoloV3, it might be better to switch to the ‘darknet’ framework (GitHub - pjreddie/darknet: Convolutional Neural Networks), which provides fast inference (uses CUDNN internally) also on windows.
My own observations match (or at least: do not contradict) HannesF99’s rule of thumb as far as TCC support is concerned. As with any rule of thumb, there can be no guarantees.
It might possibly help all CUDA users stuck with WDDM (for whatever reason) if as many people as possible complain to Microsoft about the poor performance of the WDDM driver model, for example by pointing out that it is not performance-competitive with the Linux diver model.