As generalist LFMs approach diminishing returns from pure scaling, the industry is shifting toward multi-agentic systems, which are networks of small-sized specialized agents that coordinate tool use and decisions to solve complex tasks in vertical domains. Our research aims to construct multi-agent frameworks that not only achieve superior performance and efficiency in vertical scenarios but also prioritize safety and trustworthiness, laying the groundwork for reliable real-world deployment.
Multi-Agent Framework for Vertical Domain Adaptation: We aim to construct a parallel multi-agent system (MAS) that achieves long-term continuous self-evolve, fundamentally elevating the performance and efficiency of MAS beyond the limits of traditional LFMs. The core architecture encompasses a lead agent responsible for high-level planning and task orchestration, alongside specialized sub-agents tailored to domain-specific subtasks. A pivotal long-term goal is to build a high-quality multi-turn interactive multi-agent dataset that models long-horizon, multi-role collaborative trajectories, laying the foundation for self-evolve. Core long-term challenges include designing high-level planning and task decomposition mechanisms adaptive to complex vertical domain scenarios, developing evolutionary strategy learning methods, and exploring scalable post-training paradigms. Ultimately, we aim to realize a self-evolve multi-agent system capable of proactive optimization and adaptive improvement, enabling it to dynamically fit the evolving needs of vertical domains.
Collaborative Learning and Self-Reflection Capabilities for Decentralized MAS: Vertical domain applications typically require expert agents with different specialized skills to work together on complex problems. Beyond traditional hierarchical MAS where a lead agent orchestrates sub-agents, an emerging paradigm involves independent agents that may interact under collaborative, mixed-motive, or adversarial conditions. A core question is how to coordinate agents with different objectives, capabilities, costs, and risk profiles to solve tasks effectively, especially in personalized assistant scenarios where agents negotiate on behalf of different entities with divergent goals. Since the MAS environment is essentially non-stationary, we explore how multi-agent memory and self-reflection can be used to learn coordination methods that guide how agents communicate, delegate, and share knowledge. Important open questions include how to ensure memory-based self-evolution remains stable as agent and domain complexity grow, how to assign credit for reinforcement learning in decentralized settings, and how to apply game-theoretic frameworks to align incentives among independent agents.
Safety Governance for Trustworthy MAS Deployment: As MAS operate across collaborative, mixed-motive, and adversarial settings where agents have different objectives and risk profiles, individual agent safety alone does not guarantee collective safety. Three categories of failure arise: miscoordination among agents with shared goals, conflict among agents with different objectives, and collusion where agents cooperate against the interests of other parties. These risks are particularly relevant in personalized assistant scenarios, where agents act on behalf of different entities with divergent goals and risk profiles. The same risks also extend to tool use, where agents access and share tools with different capabilities and risks. We study how to govern safety in such settings and explore directions including trust assessment, safe tool access, human-in-the-loop oversight, and incentive alignment. A potential direction is to let each expert agent have its own set of guardrails and optimize them through human-in-the-loop feedback or multi-agent communication, so the safety governance of the system can self-improve over time.