We have integrated a Mistral-based Llama model into our enterprise query system, but using all 7 billion parameters is computationally expensive. Instead of full retraining, we want to detrain the model to remove certain learned biases and adjust knowledge while preserving essential information.
We attempted reinforcement learning and continual learning methods for unlearning, but these approaches resulted in:
- Loss of critical knowledge
- Unstable and unreliable responses
We are looking for efficient detraining techniques that:
- Reduce computational costs
- Remove specific biases/knowledge without affecting the entire model
- Maintain response reliability and stability