I am playing around with digits. I found out that when I set, for instance with GoogleNet, solver type as SGD and learning rate 0.1, the accuracy reaches the 100% in a few epoches but loss remain very high like %80 and it does not change until the training finishes. if I only change LR to 0.01, things become “normal” with SGD and while the accuracy is increasing, loss is decreasing as expected. Additionally if I use AdaDelta under the same conditions with LR=0.1, things become normal too.
I wonder if there is any rule of thumb for choosing a “matching” couple for learning rate and solver type? Referencing a research paper helps too.
Sorry for my poor English if I said something very weird.
Tanks in advance!