Background:
Frameworks have been using cudnnGetConvolutionForwardAlgorithm until cuDNN 8. cuDNN 8 removed the aforementioned API which made users switch to the _v7 suffixed API (henceforth referred to as the v7 API in this post). In an attempt to avoid conditionally compiled code for cuDNN 7 and 8, I tried to use the v7 API in cuDNN 7. Unfortunately, they don’t seem to be interchangeable.
The main cause of the discrepancy is that v7 API returns WINOGRAD_NONFUSED for some situations when the non-v7 API does not. Based on limited tests on hundreds of convolution configurations, it appears that the non-v7 API does never returns WINOGRAD_NONFUSED. I verified the results from the two APIs against autotuned results. The v7 API’s heuristics appear to agree better with the autotuned results.
Question:
TensorFlow’s cuDNN 8 PR skips WINOGRAD_NONFUSED while selecting an algorithm returned by the v7 API. Why does it do so?
Is there any advice on how to move from the non-v7 API to the v7 API? It naturally feels like directly switching to the v7 API is the right way to go but the TF PR makes it questionable.