I am using Digits 5 and CUDA 8, on Ubuntu 14.04, on Amazon K80-based instances.
I trained a network for 30 epochs, and it was doing alright, and accuracy was still climbing when it stopped.
I wanted to “keep training” which I impemented as:
- new training job
- same data set
- pre-trained model
- without customization
When I start training, the 0th epoch is showing the accucacy of the incoming model.
However, as soon as training starts, the accuracy dives down to “no better than random” (there are 20 classes; accuracy is 5%)
It takes about as long to start climbing in accuracy again as it did when I originally trained the model.
Something is corrupting all the progress made in the pre-trained model.
What would cause this?