Key Milestones#

This document summarizes the major achievements and milestones across AReaL’s development history, highlighting the key contributions of each version release.

AReaL-lite (July 2025): Simplifying RL for Everyone#

đź“– Step-by-step Tutorial

AReaL-lite represents a fundamental rethinking of how researchers interact with reinforcement learning systems. Born from the recognition that AReaL’s system-first architecture created barriers for AI researchers, this lightweight version prioritizes algorithm development over infrastructure complexity.

AReaL-lite achieves 90% of the original system’s functionality while reducing the codebase by 80%. Researchers can now implement complete RL workflows within a single file, leveraging intuitive PyTorch-centric APIs that feel natural to the ML community. The architecture follows familiar SPMD patterns while providing native support for asynchronous training and multi-turn agentic workflows.

Perhaps most importantly, AReaL-lite bridges the gap between research experimentation and production deployment. Its clean abstractions integrate seamlessly with existing ML tools while maintaining the performance characteristics that made AReaL successful in the first place.

Architecture Evolution: The releases below (v0.1-v0.3) were built on the legacy realhf codebase. AReaL has now fully transitioned to the new areal architecture, with AReaL-lite serving as both a standalone system and the foundation for future development.

AReaL v0.3 (August 2025): Breaking the Speed Barrier#

đź“– Full Blog Post

Version 0.3 marked AReaL’s boldest architectural leap: completely decoupling generation from training across separate GPU clusters. This wasn’t just an incremental improvement—it fundamentally reimagined how RL systems could scale, achieving a remarkable 2.77x speedup while maintaining training stability.

The breakthrough came from solving two critical challenges that had plagued asynchronous RL: data staleness and inconsistent policy versions during generation. Through innovative staleness-aware training controls and a novel decoupled PPO objective, v0.3 turned these theoretical obstacles into practical advantages. The system now handles weight updates mid-generation without losing progress, enabling true continuous training at scale.

The results spoke for themselves: AReaL’s 14B model achieved a 69.1 score on LiveCodeBench v5, establishing new state-of-the-art performance in coding tasks. Beyond raw performance, v0.3 introduced experimental support for multi-turn agentic workflows, opening new possibilities for complex reasoning applications.

Model

LiveCodeBench v5

Codeforce

CodeContests

AReaL-boba²-8B

63.0

1962/97.5%

40.8

AReaL-boba²-14B

69.1

2044/98.2%

46.1

AReaL v0.2 (March 2025): Engineering Excellence Meets SOTA Performance#

đź“– Full Blog Post

The second major release focused on transforming AReaL from a promising research prototype into a production-ready system. At the heart of v0.2 was the transition from vLLM to SGLang v0.4.0, bringing sophisticated radix attention mechanisms that dramatically improved throughput for multi-response sampling scenarios—exactly what RL training demands.

But the real breakthrough came from recognizing that data quality trumps algorithmic complexity. Our team curated a focused dataset of 106k high-quality problems, strategically filtering out cases where base models already achieved perfect accuracy. This surgical approach to data curation, combined with systematic engineering optimizations, delivered remarkable results.

AReaL v0.2 produced the best-performing 7B mathematical reasoning model of its time, achieving 61.9 pass@1 on AIME 2024 and 48.3 on AIME 2025. Perhaps even more impressively, the team demonstrated that careful data curation could achieve competitive 32B model performance using just 200 high-quality training samples—a striking validation of the “quality over quantity” philosophy.

The system improvements were equally significant: dynamic sequence packing eliminated memory waste from padding, while data transfer with GPU-Direct RDMA enabled efficient scaling to 1000+ GPU clusters. These optimizations delivered a 1.5x throughput improvement that made large-scale experiments accessible to more research teams.

Model

AIME 2024

AIME 2025

GPQA-Diamond

AReaL-boba-RL-7B

61.9

48.3

47.6

AReaL-boba-SFT-32B

78.8

62.1

60.1

AReaL v0.1 (February 2025): The Foundation That Started It All#

đź“– Full Blog Post

At its initial release, AReaL’s 1.5B model surpassed o1-Preview on mathematical reasoning in just 40 hours of training. v0.1 also demonstrated emergent thinking behaviors in Qwen2.5-7B through R1-Zero-style training. As training progressed, models simultaneously developed longer reasoning chains and higher accuracy—a strong signal that the system was learning to “think” rather than merely pattern-match.

The technical foundation was elegantly simple: a critic-free PPO variant that traded for computational efficiency, paired with sparse binary rewards (+5/-5) that let mathematical correctness drive learning.

Model Stage

MATH500

AIME 2024

AMC 2023

Stage 1 (8K)

85.7

33.2

74.7

Stage 2 (16K)

87.4

34.2

79.6

Stage 3 (24K)

88.0

40.2

81.2