Swish versus GELU. Which Activation Function Should You Choose for Image Classification and Why?

The deeper the model, the better the Swish. What else is out there besides ReLU for activation functions?
