Creating a data flywheel is one way to optimize your AI agents in production. Here’s how it works –
- Capture Interaction Logs: Collect agent and human interaction logs from your production deployment. User inputs serve as queries, and agent responses can be used as ground truth, even without explicit human-labeled data.
- Create a Model Distillation Cycle: These interaction logs can be leveraged to experiment with and identify smaller (in other words, more efficient) models that can potentially match the accuracy of the model you have running in production. This can include leveraging customization techniques like LoRA tuning or in-context learning.
The NVIDIA NeMo microservices platform provides a set of scalable and easy-to-use microservices to build such data flywheels - with components for
- Model customization (NeMo Customizer)
- Model evaluation (NeMo Evaluator)
- Model deployment (NVIDIA NIM)
- Adding guardrails (NeMo Guardrails)
Automate with the Data Flywheel Blueprint: This blueprint provides a reference implementation of an orchestrator that automates the entire data flywheel pipeline. With a single API call, you can kick off jobs to identify more efficient backbone models for your agents. This will help you improve their reasoning and planning skills in the long run, improving their accuracy and lowering costs over time.
By setting up an automated or semi-automated data flywheel pipeline, you may continuously discover more efficient deployments over time, and in some cases up to a 98%+ reduction in associated inference cost.