MAP-THOR: Benchmarking Long-Horizon Multi-Agent Planning Frameworks in Partially Observable Environments
Published in Multi-modal Foundation Model meets Embodied AI Workshop @ICML, 2024
Authors: Siddharth Nayak*, Adelmo Morrison Orozco*, Marina Ten Have, Vittal Thirumalai, Jackson Zhang, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, James Harrison, Anuj Mahajan, Hamsa Balakrishnan
Evaluating embodied multi-agent planners necessitates robust and versatile benchmarks. We introduce MAP-THOR (Multi-Agent Planning in AI2-THOR), a benchmark specifically designed to assess the performance of embodied multi-agent planning systems in realistic, partially observable environments within the AI2-THOR environment. Existing benchmarks offer extensive environments for single-agent tasks, but fail to capture the complexities inherent in multi-agent interactions, non-stationarity, partial observability and long-horizon planning. Addressing these gaps, MAP-THOR facilitates the development of frameworks that allocate tasks and enable coordination among multiple agents. MAP-THOR introduces a comprehensive suite of household tasks demanding collaboration and adaptation to dynamic environmental changes, mirroring real-world scenarios. Our benchmark includes detailed metrics for success rate, efficiency, and collaborative effectiveness, setting a new standard for evaluating multi-agent planning systems. Through rigorous experiments, we show that MAP-THOR offers a robust evaluation framework for language model (LM)-based multi-agent planning systems. Ultimately, we hope that MAP-THOR serves as a standard benchmark to identify embodied multi-agent planning frameworks that systematically improve generalization for long-horizon partially observable planning. [PDF]
Leave a Comment