How to detrain a Llama model?

We have integrated a Mistral-based Llama model into our enterprise query system, but using all 7 billion parameters is computationally expensive. Instead of full retraining, we want to detrain the model to remove certain learned biases and adjust knowledge while preserving essential information.

We attempted reinforcement learning and continual learning methods for unlearning, but these approaches resulted in:

  1. Loss of critical knowledge
  2. Unstable and unreliable responses

We are looking for efficient detraining techniques that:

  1. Reduce computational costs
  2. Remove specific biases/knowledge without affecting the entire model
  3. Maintain response reliability and stability

Integration of mistral 7b in our system: