While everything works fine in the CI tests for the CPU versions, on my local machine I have CUDA 11.2.0 and CUDNN 8.0.5.39, and the backward pass for the ELU activation doesn’t work (the gradient is not computed correctly and any training that uses that layer makes the loss go to -NaN).
I have implemented myself a CUDA backward pass in dlib and that works. However, I would like to know why the CUDNN implementation does not work.
hi @AakankshaS, The backward pass code that I used for the elu can be found here:
The link should highlight the relevant lines.
As you can see, I added an implementation for the clipped_relu and that worked perfectly, but the elu version doesn’t. The implementation on the dlib side is exactly the same, I just changed the activation descriptor.
You’ll see that is commented out, because the backward pass does not compute the gradients properly, so I rolled my own.
ReLU works in both forward and backward calls, but the other two only work in forward mode (tested against my CPU implementation, which I think is correct…)
I am probably using the API wrong, but the only difference with these activation layers and ReLU is that they take an extra coef parameter.