Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

Abstract

Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods for multi-agent collaboration. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training. At its core, Optima employs an iterative generate, rank, select, and train paradigm, incorporating a reward function that balances task performance, token efficiency, and communication readability. We explore various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs for iterative LLM-based MAS training. Additionally, we integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, conceptualizing conversation turns as tree nodes to explore diverse interaction trajectories. We evaluate Optima on common multi-agent tasks, including information-asymmetric question answering and complex reasoning. Our method demonstrates consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than 10% tokens on tasks requiring heavy multi-agent information exchange. Moreover, Optima's efficiency gains open new possibilities for leveraging inference-compute more effectively, potentially leading to improved inference-time scaling laws. By addressing fundamental challenges in multi-agent collaboration and providing a novel optimization framework, Optima shows the potential towards scalable, efficient, and effective LLM-based MAS.

We introduce Optima, a novel framework for training LLM-based multi-agent systems (MAS) that significantly enhances both communication efficiency and task effectiveness. Optima addresses key challenges in existing MAS implementations:

Inefficient inter-agent communication leading to high token usage.
Lack of systematic methods to optimize LLM-based MAS as a cohesive unit.

Our approach provides a comprehensive solution to these challenges, demonstrating substantial improvements in both performance and efficiency.

Optima employs an iterative generate, rank, and train paradigm, exploring Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), and the hybrid of them. We integrate a Monte Carlo Tree Search (MCTS)-inspired approach for high-quality DPO training data generation in multi-agent settings.

Our framework balances task performance, token efficiency, and communication interpretability, leading to the development of effective, efficient, and interpretable multi-agent systems.

Figure: Overview of the Optima framework for training LLM-based MAS. The iterative process includes four stages: Generate, Rank, Select, and Train. Note that the ranking process, while also involved in DPO data generation, is not shown in the Generate stage for simplicity.

Performance and Efficiency Gains

Optima demonstrates significant improvements across various tasks, including information-asymmetric question answering and complex reasoning . Our method consistently outperforms single-agent baselines and vanilla multi-agent systems.

Key Results: Up to 90% reduction in token usage and 2.8x increase in task accuracy across diverse tasks.

Transferability

We demonstrate Optima's ability to transfer knowledge effectively across related tasks, showcasing its potential for developing adaptable multi-agent systems.

Case Study: Communication Evolution

To illustrate the effectiveness of Optima in shaping agent communication, we present a case study showing the evolution of dialogue patterns across training iterations for Optima-iSFT on the 2WikiMultiHopQA (2WMH QA) task.

Figure: Evolution of agent communication in Optima-iSFT across iterations on 2WMH QA. The different contexts given to the two agents are omitted for brevity. The progression demonstrates increasing efficiency and task-oriented communication.

This case study reveals how Optima progressively optimizes inter-agent communication:

Efficiency: As training progresses, we observe a significant reduction in the number of tokens used, demonstrating Optima's ability to encourage concise communication.
Task Focus: Later iterations show more direct, task-oriented exchanges, with agents quickly honing in on relevant information.
Information Synthesis: The evolved communication patterns show improved ability to synthesize information from multiple sources, a key requirement for the 2WMH QA task.

This evolution in communication patterns directly contributes to the performance improvements and token reductions observed in our quantitative results, showcasing Optima's effectiveness in optimizing multi-agent interactions.

BibTeX

@article{chen2024optima,
  title={Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System},
  author={Chen, Weize and Yuan, Jiarui and Qian, Chen and Yang, Cheng and Liu, Zhiyuan and Sun, Maosong},
  journal={arXiv preprint arXiv:2410.08115},
  year={2024}
}