inclusionAI/AReaL
AReaL: A Large-Scale Asynchronous Reinforcement Learning System
| Paper | Documentation | ไธญๆๆๆกฃ | Ask DeepWiki | ๐ค Models & Data |
WeChat (ๅพฎไฟก) Group |
AReaL is an open-source fully asynchronous reinforcement learning training system for large reasoning and agentic models, developed by members from Tsinghua IIIS and the AReaL Team at Ant Group. Built upon the open-source project ReaLHF, we are fully committed to open-source principles by providing the training details, data, and infrastructure required to reproduce our results, along with the models themselves. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because itโs delicious, customizable, and affordableโwe hope you enjoy our project just as much as youโd enjoy real milk tea. Cheers!
AReaL Highlights
- โก Flexibility: Seamless customization for
agentic RL and
online RL training by simply replacing the
base_url. - ๐ Scalability: Stable fully asynchronous RL training with industry-leading speed.
- โจ Cutting-Edge Performance: State-of-the-art math, coding, search, and customer service agents.
๐ฐ News
\[2026/03/02\] We provide a complete example to train your
own ๐ฆ OpenClaw agent by simply replacing the base_url and api_key with AReaLโs RL
service - no complicated dependencies, no code changes, works with any agentic runtime!
\[2026/02/06\] We are delighted to introduce AReaL-SEA, a self-evolving data synthesis engine. Combined with RL training on AReaL, the 235B MoE model surpasses GPT 5 and achieves comparable performance with Gemini 3.0 Pro on $\tau^2$-bench! Check out the paper, model, data, and code.
\[2026/01/15\] Congrats to our friends at CAMEL-AI for open-sourcing SETA, their terminal agent RL project trained with AReaL! Check out their training workflow and the announcement on X.
๐ Previous Releases
\[2026/01/01\] Happy New Year! Thanks to the outstanding contribution from
@HwVanICI, we are excited to officially announce stable support for AReaL training on
Ascend NPU devices! The code is actively maintained and continuously updated in the
ascend branch. Check out
our documentation
to get started, and feel free to report any issues!
\[2025/08/30\] Introducing ASearcher, a state-of-the-art search agent built with AReaLโs end-to-end asynchronous RL training. Check out the paper and the open-source repository!
\[2025/07/31\] (AReaL-lite) We introduce AReaL-lite, a lightweight version of AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite features an algorithm-first API design that prioritizes ease of use and algorithm development, while natively supporting fully asynchronous agentic RL. With 80% fewer lines of code, AReaL-lite maintains 90% of AReaLโs performance and core functionality. Check out our AReaL-lite design documentation and the quickstart guide to begin your journey with AReaL-lite!
\[2025/06/03\] (v0.3, bobaยฒ) We release bobaยฒ (double-boba) for fully asynchronous RL training, which achieves 2.77ร speedup while delivering comparable or superior training performance compared to synchronous systems. Furthermore, asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out our v0.3 overview blog and the research paper.
\[2025/03/31\] (v0.2, boba) Introducing our milestone releaseโboba! Please call it A-ReaL-boba! This release features significantly faster training with SGLang support and state-of-the-art 7B and 32B models for mathematical reasoning. Check out our v0.2 technical blog.
\[2025/02/24\] (v0.1) Our initial release includes reproducible results for 1.5B and 7B Large Reasoning Models (LRMs). Check out our v0.1 technical blog.
๐ Getting Started
First, install the package:
|
|
Our training scripts automatically download the required dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:
|
|
To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in the YAML file to point to your shared storage):
|
|
For comprehensive setup instructions, see our quickstart guide.
๐ Examples
Math & Reasoning
| Task | Description | Performance |
|---|---|---|
| Math | GSM8K math reasoning with GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more | - |
| Multi-Turn Math | Multi-turn math agent with reward discounting across turns | Training Curve |
| LoRA Math | Parameter-efficient math training with LoRA (SGLang/vLLM backends) | - |
| Countdown | Countdown numbers game with custom rewards | Training Curve |
Agentic RL
| Task | Description | Performance |
|---|---|---|
| General Agent | General agentic training with any agentic frameworks | Guide |
| Tau2 Customer Service | Customer service agent on Tau2-Bench (retail, airline, telecom) | Paper |
| Search Agent | End-to-end search agent with Tongyi-DeepResearch workflow | Training Curve |
| Tool-Integrated Reasoning | Multi-turn tool calling during reasoning (Python executor, calculator) | Training Curve |
| OpenAI Agents Integration | Integration with OpenAI Agents SDK for agentic workflows | - |
| CAMEL-AI Integration | Integration with CAMEL-AI framework for agentic RL | - |
Vision-Language Models
| Task | Description | Performance |
|---|---|---|
| VLM | Geometry3K and CLEVR Count 70K visual reasoning with GRPO | - |
| VLM on NPU | VLM training on Huawei NPU hardware | Benchmark Results |
Alignment & Infrastructure
| Task | Description | Performance |
|---|---|---|
| RLHF Reward Modeling | Bradley-Terry reward modeling on Anthropic HH-RLHF | Training Curve |
| SkyPilot Deployment | Cloud deployment with SkyPilot (GCP, AWS, Kubernetes) | Screenshots |
๐ง Support Matrix
๐ง Algorithms
All RL algorithms support both asynchronous and synchronous versions by setting
max_head_offpolicyness=0. See Asynchronous RL Guide.
| Algorithm | Documentation | Paper | Configuration |
|---|---|---|---|
| GRPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| GSPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| PPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| DAPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| LitePPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| Dr.GRPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| REINFORCE++ | - | ๐ Paper | ๐ GSM8K Example |
| RLOO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| SAPO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| M2PO | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
| RLHF Reward Modeling | - | - | ๐ RLHF Example |
| SFT | - | - | ๐ GSM8K Example |
| Distillation | ๐ Docs | ๐ Paper | ๐ GSM8K Example |
Models
| Model Family | Megatron | PyTorch FSDP | PyTorch Archon | Notes |
|---|---|---|---|---|
| Qwen2/3 | โ | โ | โ | - |
| Qwen3-MoE | โ | โ | โ | - |
| Qwen2.5-VL | โ | โ | โ | Vision-language model |
| Qwen3-VL | โ | โ | โ | Vision-language model |
| Gemma 3 | โ | โ | โ | Vision-language model |
| Other Hugging Face LLM | โ | โ | โ | Compatibility depending on the version of transformers |
Check the AI Coding Assistant Guide and Archon Reference for how to integrate new models into AReaL.
Training Backends
| Backend | DP | Tensor Parallel | Sequence Parallel within TP | Context Parallel | Pipeline Parallel | Expert Parallel | 1D Sequence Packing | LoRA |
|---|---|---|---|---|---|---|---|---|
| Megatron | โ (ZeRO-1) | โ | โ | โ | โ | โ | โ | โ |
| PyTorch FSDP | โ (FSDP2) | โ | โ | โ | โ | โ | โ | โ |
| PyTorch Archon | โ (FSDP2) | โ | โ | โ | โ | โ | โ | โ |
Inference Backends
| Backend | Tensor Parallel | Context Parallel | Pipeline Parallel | Data Parallel Attention | Expert Parallel |
|---|---|---|---|---|---|
| vLLM | โ | โ | โ | โ | โ |
| SGLang | โ | โ | โ | โ | โ |
๐ Resources
Tutorial
Code Walkthrough
Best Practices
- Improving Algorithm Performance
- Agent Workflow Best Practices
- Debugging
- Handling OOM Issues
- Performance Profiling
Customization
Algorithms
Reference
- CLI Configurations
- Checkpointing
- Metrics Tracking
- Allocation Mode
- Rollout Workflow
- Agent Workflow
- AI-Assisted Development
๐ค Contributing
We warmly welcome contributions from the community! Whether youโre fixing bugs, adding features, improving documentation, or helping others, your contribution is valued. Please check our Contributing Guide for detailed information.
|
|
๐บ๏ธ Future Roadmap
AReaL is under active development with planned minor releases weekly and major releases monthly. We warmly welcome community engagement and contributions. We are also actively hiring interns and full-time employees with open positions in both the US and China.
๐ Acknowledgments
We gratefully acknowledge that major contributors are from the AReaL Team at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University and Ant Group.
We have also received invaluable assistance from the following groups (listed alphabetically):
-
The Data Intelligence Lab at Ant Research for their data support
-
@HwVanICI for support on vLLM, LoRA, NPU integration, and more
-
The Relaxed System Lab at HKUST for seamless collaboration on numerous system-related aspects
-
The SGLang team for supporting custom weight update features and their contributions during AReaL-lite development
-
The Super Computing Technology (SCT) team at Ant Group for their expertise in large-scale cluster operations and maintenance
-
Special thanks to @Lyken17 for providing valuable suggestions throughout the API design process
We also deeply appreciate all pioneering work from the community, particularly the ReaLHF project from OpenPsi Inc. and other outstanding projects, including but not limited to DeepScaleR, Open-Reasoner-Zero, OpenRLHF, VeRL, SGLang, QwQ, Light-R1, and DAPO.
๐ Citation
|
|
|
|