Problem with ELU BACKWARD in CUDNN

arrufat · January 15, 2021, 4:19pm

Hi, I was adding some activation layers from CUDNN not currently present in dlib, in particular:

CUDNN_ACTIVATION_CLIPPED_RELU
CUDNN_ACTIVATION_ELU

See PR details here.

While everything works fine in the CI tests for the CPU versions, on my local machine I have CUDA 11.2.0 and CUDNN 8.0.5.39, and the backward pass for the ELU activation doesn’t work (the gradient is not computed correctly and any training that uses that layer makes the loss go to -NaN).

I have implemented myself a CUDA backward pass in dlib and that works. However, I would like to know why the CUDNN implementation does not work.

AakankshaS · January 22, 2021, 4:19am

Hello @arrufat,
Can you please share the code with us?
Thanks!

arrufat · January 22, 2021, 7:29am

hi @AakankshaS, The backward pass code that I used for the elu can be found here:

The link should highlight the relevant lines.

As you can see, I added an implementation for the clipped_relu and that worked perfectly, but the elu version doesn’t. The implementation on the dlib side is exactly the same, I just changed the activation descriptor.
You’ll see that is commented out, because the backward pass does not compute the gradients properly, so I rolled my own.

arrufat · January 22, 2021, 8:49am

In order to ease the testing process, I’ve created a branch that contains this change and a test executable. You can run it yourself typing:

git clone git clone https://github.com/arrufat/dlib
cd dlib/dlib/test
git checkout elu-gradient-problem
cmake -B build -GNinja
ninja -C build
build/dtest --test_dnn

I’ve included tests for

relu
clipped_relu
elu

ReLU works in both forward and backward calls, but the other two only work in forward mode (tested against my CPU implementation, which I think is correct…)

I am probably using the API wrong, but the only difference with these activation layers and ReLU is that they take an extra coef parameter.

Thanks in advance.