I am currently implementing the cuDNN Graph API as a learning exercise. While creating single-node graphs to test my implementation, I encountered an issue with Backward Activation (Pointwise).
The documentation for the CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR describes the following attributes:
- CUDNN_ATTR_OPERATION_POINTWISE_XDESC: Descriptor for input tensor
X. Required for pointwise mathematical functions or activation forward propagation. - CUDNN_ATTR_OPERATION_POINTWISE_BDESC: Descriptor for a second input tensor
B, used in dual-input operations (e.g., add/multiply). Not required for single-input operations. - CUDNN_ATTR_OPERATION_POINTWISE_YDESC: Descriptor for output tensor
Y. Required for pointwise mathematical functions or activation forward propagation. - CUDNN_ATTR_OPERATION_POINTWISE_DXDESC: Descriptor for output tensor
dX. Required for pointwise activation backpropagation. - CUDNN_ATTR_OPERATION_POINTWISE_DYDESC: Descriptor for input tensor
dY. Required for pointwise activation backpropagation.
The documentation clearly states that:
XDESCandYDESCare required for forward propagation.DXDESCandDYDESCare required for backward propagation.
The Issue:
- If I only set
DXDESCandDYDESC, I get aCUDNN_STATUS_BAD_PARAMerror during the finalization of theCUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR. - If I also set
XDESCandYDESC, the descriptor finalizes successfully, but I encounter the following error during graph execution:
E! CuDNN (v90501 17) function cudnnBackendExecute() called:
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: varPackUidCount > expected_uid_count
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: isCompatibleVariantPack(vars)
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: plan.getEnginePtr()->execute(vars, handle->streamId)
The working configuration was to set XDESC, DXDESC, and DYDESC, while leaving YDESC unset.
My Confusion:
The documentation implies that XDESC is primarily for forward propagation. However:
- In my tests,
XDESCis also required for backward propagation. - Meanwhile,
YDESC—which intuitively could be useful for functions like Sigmoid or Tanh—causes execution errors when included.
Theoretical Assumption:
Certain activation functions behave differently in backward propagation:
- Swish, ReLU: These rely on the input value before activation (
XDESC). - Sigmoid, Tanh: These can compute gradients using only the output value after activation (
YDESC).
Questions:
Is my assumption correct that backward activation requires XDESC, DXDESC, and DYDESC, but never YDESC?
Thank you for your help and clarification!