CLIPPED_RELU vs cudnnConvolutionBiasActivationForward()


I was trying to optimize execution of Con2D->Bias->ReLU_with_upper_limit (e.g. ReLU6) with cudnnConvolutionBiasActivationForward(). I have noticed that cudnnConvolutionBiasActivationForward behavior is not consistent with its documentation. It should return NOT_SUPPORTED in case of CLIPPED_RELU activation. However, it does not and executes without an error. On the other hand, such operation ignores clipping threshold for activation and behaves like a typical ReLU.

So, where is the problem? cudnnConvolutionBiasActivationForward should return NOT_SUPPORTED for CLIPPING_RELU or it should correctly clip values? Is there a way to optimize Con2D->Bias->ReLU6 operation on GPUs with ComputeCapability<7.5? I was trying to use Backend API, but it ended in very similar way: no error but not-clipped results.

And by the way: from documentation of coef parameter in cudnnSetActivationDescriptor() :

Input . Floating point number. When the activation mode (see cudnnActivationMode_t) is set to CUDNN_ACTIVATION_CLIPPED_RELU, this input specifies the clipping threshold; and when the activation mode is set to CUDNN_ACTIVATION_RELU, this input specifies the upper bound.

What is the upper bound for CUDNN_ACTIVATION_RELU?

My specs:
Windows 10 with CUDA 11.1and cuDNN 8.1.1.
GTX1060 6GB (driver: 460.89)