Backward Activation in cuDNN Graph API 9.5.1: Clarification on Y Tensor

I am currently implementing the cuDNN Graph API as a learning exercise. While creating single-node graphs to test my implementation, I encountered an issue with Backward Activation (Pointwise).

The documentation for the CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR describes the following attributes:

  • CUDNN_ATTR_OPERATION_POINTWISE_XDESC: Descriptor for input tensor X. Required for pointwise mathematical functions or activation forward propagation.
  • CUDNN_ATTR_OPERATION_POINTWISE_BDESC: Descriptor for a second input tensor B, used in dual-input operations (e.g., add/multiply). Not required for single-input operations.
  • CUDNN_ATTR_OPERATION_POINTWISE_YDESC: Descriptor for output tensor Y. Required for pointwise mathematical functions or activation forward propagation.
  • CUDNN_ATTR_OPERATION_POINTWISE_DXDESC: Descriptor for output tensor dX. Required for pointwise activation backpropagation.
  • CUDNN_ATTR_OPERATION_POINTWISE_DYDESC: Descriptor for input tensor dY. Required for pointwise activation backpropagation.

The documentation clearly states that:

  • XDESC and YDESC are required for forward propagation.
  • DXDESC and DYDESC are required for backward propagation.

The Issue:

  1. If I only set DXDESC and DYDESC, I get a CUDNN_STATUS_BAD_PARAM error during the finalization of the CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR.
  2. If I also set XDESC and YDESC, the descriptor finalizes successfully, but I encounter the following error during graph execution:

E! CuDNN (v90501 17) function cudnnBackendExecute() called:
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: varPackUidCount > expected_uid_count
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: isCompatibleVariantPack(vars)
e! Error: CUDNN_STATUS_BAD_PARAM_NULL_POINTER; Reason: plan.getEnginePtr()->execute(vars, handle->streamId)

The working configuration was to set XDESC, DXDESC, and DYDESC, while leaving YDESC unset.

My Confusion:

The documentation implies that XDESC is primarily for forward propagation. However:

  • In my tests, XDESC is also required for backward propagation.
  • Meanwhile, YDESC—which intuitively could be useful for functions like Sigmoid or Tanh—causes execution errors when included.

Theoretical Assumption:

Certain activation functions behave differently in backward propagation:

  • Swish, ReLU: These rely on the input value before activation (XDESC).
  • Sigmoid, Tanh: These can compute gradients using only the output value after activation (YDESC).

Questions:

Is my assumption correct that backward activation requires XDESC, DXDESC, and DYDESC, but never YDESC?

Thank you for your help and clarification!

Hi @kiwimanshare ,
I am checking on this issue and will update the thread.

Thanks

1 Like