TF and Pytorch are slower on Windows than on linux

I have opened 2 bugs, which have been confirmed.

I think it’s because of wddm. Of course I’m having difficulty finding information on which cards support tcc. Like if I go buy a cheap p400 will that support it? It pisses me off I can’t do it on my 1400 dollar 2080ti.

One single search for “nvidia cards that support tcc” brings:

Does it though? Did you actually read anything you linked? The laptop one is just completely unrelated, then in the stackoverflow post they say Quadro cards, which is why I asked about the p400. But who knows?

Most companies publish a compatibility matrix for their products. I shouldn’t have to rely on random people on stack overflow.

Did you read the first post? It tells exactly what class of cards support TCC.
I can see people are more and more ignoring childish posts, such as this one and your other one ([url][/url]), plus some other cross-posts I’ve seen here and there about this P400.
I think I should just do the same…

I posted a legitimate a bug in this one and expressed my frustrations with continued anti-consumer practices. It’s not childish, there’s literally no other venue to express your frustrations at arbitrary decisions. The fact that people can flash custom firmwares in older cards and modify drivers with newer ones clearly show it’s not a hardware limitation.

“TCC mode should be available for Tesla GPUs, most Quadro desktop GPUs, and GeForce Titan family”

I’m sorry I don’t want to spend my money to test a bug theory on some guys forum post where he says it should work. How about some official documentation?

There isn’t a chart anywhere published by NVIDIA that details which cards support TCC and which don’t.

If you want to see a change to CUDA, whether that be performance, behavior, or documentation, I suggest filing a bug. The directions are linked at the top of this forum in a sticky post.

In order to set expectations, NVIDIA works on things according to their own priority. Not all bugs filed get prosecuted to the same extent.

As a rough rule, can usually expect all Quadro cards of a generation equal or higher to Quadro 4000 (so e.g. for currrent ‘Turing’ generation the Quadro RTX 4000 / 5000 / 6000 / 8000) support TCC. That is at least our experience so far. I don’t think that a Quadro P400 supports TCC.

The decision which cards get TCC is I suppose more one of market segmentation (and testing, support & qualification) than of hardware limitations. The same applies for double-precision capabilities etc. Any company is of course free to segment the market as they see useful - e.g. Intel does the same with Core / Xeon.

It seems that Pytorch is not very optimized for the specifics of WDDM (avoid short kernel launches, avoid repeated memory allocations). Regarding YoloV3, it might be better to switch to the ‘darknet’ framework (GitHub - pjreddie/darknet: Convolutional Neural Networks), which provides fast inference (uses CUDNN internally) also on windows.

My own observations match (or at least: do not contradict) HannesF99’s rule of thumb as far as TCC support is concerned. As with any rule of thumb, there can be no guarantees.

It might possibly help all CUDA users stuck with WDDM (for whatever reason) if as many people as possible complain to Microsoft about the poor performance of the WDDM driver model, for example by pointing out that it is not performance-competitive with the Linux diver model.