Using the new backend cudnn API I am getting CUDNN_STATUS_NOT_SUPPORTED whenever I call cudnnBackendFinalize() on the CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR - when trying to fuse a non-broadcasted point wise addition before a matmul.
I can successfully fuse a relu before the matmul, I can also fuse a broadcasted point wise addition before the matmul. The same non-broadcasted point wise addition also works perfectly if fused after the matmul.
The datatype of all tensors in the fusion are fp16 and compute type is fp32 as per the fusion engine limitations.
This error sounds like I am running up against one of the fusion engines limitations; but having studied them I cannot seem to figure what I am doing wrong. Any suggestions would be greatly appreciated.