cudnnAddTensor broadcasting

Hello,
I’m trying to use cudnnAddTensor’s broadcasting feature according to API reference ([url]https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnAddTensor[/url]):

[i]This function adds the scaled values of a bias tensor to another tensor. Each dimension of the bias tensor A must match the corresponding dimension of the destination tensor C or must be equal to 1. In the latter case, the same value from the bias tensor for those dimensions will be used to blend into the C tensor.

Note: Up to dimension 5, all tensor formats are supported. Beyond those dimensions, this routine is not supported[/i]

I’ve got 32bit float tensors A = dims(32,32,1,1,768) and C = dims(32,32,1,128,768). But cudnnTensorAdd fails with CUDNN_STATUS_NOT_SUPPORTED. Does anyone have any clue how to resolve this? Thank you in advance.

Same issue with cudnn 7.5. Even fails with bias simply [4,1,1,1] and C [4,4,1,1].
The mnistCUDNN example is doing these adds :
addBias: bias is [1 20 1 1] and C is [1 20 24 24]
addBias: bias is [1 50 1 1] and C is [1 50 8 8]

I guess cudnn being closed source, the unit test suite is also closed source ?

I can sadly confirm this is still broken on the latest release (7.6.5). However if you want to benefit from broadcasting speed up you could use cudnnOpTensor with CUDNN_OP_TENSOR_ADD. That works just fine. It feels like cudnnAddTensor should be just a wrapper for cudnnOpTensor but it seems like it isn’t.