I am currently implementing the cuDNN Graph API as a learning exercise. While creating single-node graphs to test my implementation, I encountered an issue with Backward Activation (Pointwise).
The documentation for the CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR describes the following attributes:
- CUDNN_ATTR_OPERATION_POINTWISE_XDESC: Descriptor for input tensor
X
. Required for pointwise mathematical functions or activation forward propagation. - CUDNN_ATTR_OPERATION_POINTWISE_BDESC: Descriptor for a second input tensor
B
, used in dual-input operations (e.g., add/multiply). Not required for single-input operations. - CUDNN_ATTR_OPERATION_POINTWISE_YDESC: Descriptor for output tensor
Y
. Required for pointwise mathematical functions or activation forward propagation. - CUDNN_ATTR_OPERATION_POINTWISE_DXDESC: Descriptor for output tensor
dX
. Required for pointwise activation backpropagation. - CUDNN_ATTR_OPERATION_POINTWISE_DYDESC: Descriptor for input tensor
dY
. Required for pointwise activation backpropagation.
The documentation clearly states that:
XDESC
andYDESC
are required for forward propagation.DXDESC
andDYDESC
are required for backward propagation.
The Issue:
- If I only set
DXDESC
andDYDESC
, I get aCUDNN_STATUS_BAD_PARAM
error during the finalization of theCUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR
. - If I also set
XDESC
andYDESC
, the descriptor finalizes successfully, but I encounter the following error during graph execution:
E! CuDNN (v90501 17) function cudnnBackendExecute() called:
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: varPackUidCount > expected_uid_count
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: isCompatibleVariantPack(vars)
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: plan.getEnginePtr()->execute(vars, handle->streamId)
The working configuration was to set XDESC
, DXDESC
, and DYDESC
, while leaving YDESC
unset.
My Confusion:
The documentation implies that XDESC
is primarily for forward propagation. However:
- In my tests,
XDESC
is also required for backward propagation. - Meanwhile,
YDESC
—which intuitively could be useful for functions like Sigmoid or Tanh—causes execution errors when included.
Theoretical Assumption:
Certain activation functions behave differently in backward propagation:
- Swish, ReLU: These rely on the input value before activation (
XDESC
). - Sigmoid, Tanh: These can compute gradients using only the output value after activation (
YDESC
).
Questions:
Is my assumption correct that backward activation requires XDESC
, DXDESC
, and DYDESC
, but never YDESC
?
Thank you for your help and clarification!