Variations in heuristics for selecting conv algorithm in different APIs

Background:

Frameworks have been using cudnnGetConvolutionForwardAlgorithm until cuDNN 8. cuDNN 8 removed the aforementioned API which made users switch to the _v7 suffixed API (henceforth referred to as the v7 API in this post). In an attempt to avoid conditionally compiled code for cuDNN 7 and 8, I tried to use the v7 API in cuDNN 7. Unfortunately, they don’t seem to be interchangeable.

The main cause of the discrepancy is that v7 API returns WINOGRAD_NONFUSED for some situations when the non-v7 API does not. Based on limited tests on hundreds of convolution configurations, it appears that the non-v7 API does never returns WINOGRAD_NONFUSED. I verified the results from the two APIs against autotuned results. The v7 API’s heuristics appear to agree better with the autotuned results.

Question:

TensorFlow’s cuDNN 8 PR skips WINOGRAD_NONFUSED while selecting an algorithm returned by the v7 API. Why does it do so?

Is there any advice on how to move from the non-v7 API to the v7 API? It naturally feels like directly switching to the v7 API is the right way to go but the TF PR makes it questionable.

Removing WINOGRAD_NONFUSED from the v7 API’s results and selecting the best algorithm gives the same result as the non-v7 API.

So the question now is should WINOGRAD_NONFUSED be ignored or allowed?

Hi @YashasSamaga,

Is there a typo and you meant does not?
For that matter, the reason some framework choose to remove “WINOGRAD_NONFUSED” is because sometimes it does not produce very good numerical precision. However we have not head it causing huge convergence problems either. It’s a per-framework decision to skip it or not.

For general usage, it would be safer to imitate the old TF behavior and skip it to avoid any surprise in existing working scripts. if you want performance and has more tolerance on convergence issues, you can choose to allow it.

Thanks!