Bug report---cuDNN doesn't catch invalid configuration for cudnnConvolutionBiasActivationForward

When running cudnnConvolutionBiasActivationForward with the _IMPLICIT_GEMM algorithm and an IDENTITY activation, cuDNN happily computes an incorrect answer. Upon closer reading of the Developer Guide, we believe this configuration to be invalid, so we expect the routine to return CUDNN_STATUS_NOT_SUPPORTED instead of CUDNN_STATUS_SUCCESS. The invalidity of the configuration follows from the second note in the documentation for cudnnConvolutionBiasActivationForward:

if the mode of the cudnnActivationMode_t field is set to the enum value CUDNN_ACTIVATION_IDENTITY, 
then the input cudnnConvolutionFwdAlgo_t of this function cudnnConvolutionBiasActivationForward() must
be set to the enum value CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_​PRECOMP_GEMM.

The likely-invalid convolution we’re running is:

I! CuDNN (v7201) function cudnnConvolutionBiasActivationForward() called:
i!     handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i!     alpha1: type=CUDNN_DATA_FLOAT; val=1.000000;
i!     xDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,3,224,224];
i!         strideA: type=int; val=[150528,50176,224,1];
i!     xData: location=dev; addr=0x7fdfec600000;
i!     wDesc: type=cudnnFilterDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         vect: type=int; val=0;
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[12,3,3,3];
i!         format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
i!     wData: location=dev; addr=0x7fdfec400000;
i!     convDesc: type=cudnnConvolutionDescriptor_t:
i!         mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
i!         arrayLength: type=int; val=2;
i!         padA: type=int; val=[1,1];
i!         strideA: type=int; val=[1,1];
i!         dilationA: type=int; val=[2,2];
i!         groupCount: type=int; val=1;
i!     algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM (0);
i!     workSpace: location=dev; addr=NULL_PTR;
i!     workSpaceSizeInBytes: type=size_t; val=0;
i!     alpha2: type=CUDNN_DATA_FLOAT; val=0.000000;
i!     zDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,12,222,222];
i!         strideA: type=int; val=[591408,49284,222,1];
i!     zData: location=dev; addr=0x7fdfec800000;
i!     biasDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,12,1,1];
i!         strideA: type=int; val=[12,1,1,1];
i!     bias: location=dev; addr=0x7fdff0e00c00;
i!     activationDesc: type=cudnnActivationDescriptor_t: 
i!         coef: type=double; val=0.000000;
i!         mode: type=cudnnActivationMode_t; val=CUDNN_ACTIVATION_IDENTITY (5);
i!         reluNanOpt: type=cudnnNanPropagation_t; val=CUDNN_NOT_PROPAGATE_NAN (0);
i!     yDesc: type=cudnnTensorDescriptor_t:
i!         dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i!         nbDims: type=int; val=4;
i!         dimA: type=int; val=[1,12,222,222];
i!         strideA: type=int; val=[591408,49284,222,1];
i!     yData: location=dev; addr=0x7fdfec800000;
i! Time: 2018-11-29T17:39:35.758043 (0d+0h+0m+1s since start)
i! Process=21910; Thread=21910; GPU=0; Handle=0x20051b0; StreamId=(nil) (defaultStream).

We’ve since switched to a valid-but-slower configuration, but it would’ve saved us time if cuDNN caught the invalid configuration.