- Authors: Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi “Jim” Fan, Anima Anandkumar
- Source: Excerpts from “2305.16291v2.pdf”
- Link: https://voyager.minedojo.org
Introduction
This paper introduces VOYAGER, a novel AI agent powered by Large Language Models (LLMs) that demonstrates lifelong learning capabilities within the Minecraft environment. VOYAGER continuously explores, acquires skills, and makes new discoveries autonomously. This is achieved through the interaction with GPT-4 via blackbox queries, eliminating the need for model fine-tuning.
Key Components of VOYAGER
VOYAGER leverages three key components:
- Automatic Curriculum: This module uses GPT-4’s knowledge to generate increasingly challenging tasks, fostering continuous exploration and skill development. The curriculum adapts to the agent’s current state and progress, ensuring a manageable yet stimulating learning process.
As VOYAGER progresses to harder self-driven goals, it naturally learns a variety of skills, such as “mining a diamond.
- Skill Library: This component stores and retrieves complex behaviors represented as executable code. Each skill is indexed by its description embedding, enabling efficient retrieval in similar situations. This approach facilitates skill compositionality, allowing VOYAGER to rapidly expand its capabilities and mitigate catastrophic forgetting.
Complex skills can be synthesized by composing simpler programs, which compounds VOYAGER’s capabilities rapidly over time and alleviates catastrophic forgetting in other continual learning methods.
- Iterative Prompting Mechanism: LLMs can struggle to generate perfect code in a single attempt. To overcome this, VOYAGER uses an iterative prompting mechanism that:
- Executes the generated code in Minecraft, gathering observations and error traces.
- Integrates this feedback into GPT-4’s prompt for code refinement.
- Repeats this process until a self-verification module confirms task completion.
This iterative prompting approach significantly improves program synthesis for embodied control, enabling VOYAGER to continuously acquire diverse skills without human intervention.
Self-Verification Module
This module, powered by GPT-4, plays a crucial role in VOYAGER’s learning process. It assesses the agent’s performance by:
- Determining if the executed code successfully completed the assigned task.
- Providing critiques and suggestions for improvement if the task fails.
Our self-verification is more comprehensive than self-reflection by both checking success and reflecting on mistakes.
Evaluation and Results
VOYAGER’s performance was evaluated against other LLM-based agents in the MineDojo framework, showcasing significantly improved results:
- Exploration and Skill Acquisition: VOYAGER acquired 3.3x more unique items, traveled 2.3x longer distances, and unlocked key tech tree milestones up to 15.3x faster than previous state-of-the-art methods.
- Zero-Shot Generalization: VOYAGER effectively transferred its learned skills to solve novel tasks in a new Minecraft world, demonstrating strong generalization capabilities that outperformed baselines.
VOYAGER is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other methods struggle to generalize.
Ablation Studies
The authors conducted ablation studies to isolate the impact of each component on VOYAGER’s performance. Removing any component resulted in diminished exploration capabilities, highlighting the importance of the integrated system.
Future Work
The authors acknowledge the computational costs associated with GPT-4. They are optimistic that future improvements in LLM APIs and the development of techniques for fine-tuning open-source LLMs will address these limitations.
Conclusion
VOYAGER demonstrates the potential of combining LLMs with an iterative learning and self-verification process to develop embodied agents capable of continuous learning and skill acquisition in open-ended environments. This research paves the way for creating more sophisticated AI systems that can adapt and learn in complex, dynamic worlds.