We introduce Optima, a novel framework for training LLM-based multi-agent systems (MAS) that significantly enhances both communication efficiency and task effectiveness. Optima addresses key challenges in existing MAS implementations:
- Inefficient inter-agent communication leading to high token usage.
- Lack of systematic methods to optimize LLM-based MAS as a cohesive unit.
Our approach provides a comprehensive solution to these challenges, demonstrating substantial improvements in both performance and efficiency.
Optima employs an iterative generate, rank, and train paradigm, exploring Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), and the hybrid of them. We integrate a Monte Carlo Tree Search (MCTS)-inspired approach for high-quality DPO training data generation in multi-agent settings.
Our framework balances task performance, token efficiency, and communication interpretability, leading to the development of effective, efficient, and interpretable multi-agent systems.
