跳到主要内容

PromptCoT & PromptCoT-Mamba: Advancing the Frontiers of Reasoning

· 阅读需 5 分钟
inclusionAI
Ant Group

News

  • May 30, 2025: PromptCoT-Mamba released! Introducing an attention-free foundation model for reasoning tasks.
  • Apr 11, 2025: PromptCoT-QwQ-32B model and its training data released, achieving new state-of-the-art results.
  • Mar 7, 2025: PromptCoT project launched, including the problem generation model, distilled models (PromptCoT-DS series), and associated datasets.

Overview

This repository unifies two synergistic projects aimed at advancing the frontiers of mathematical and code reasoning in Large Language Models (LLMs): PromptCoT and PromptCoT-Mamba.

PromptCoT (Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models) addresses the critical challenge of acquiring high-quality, complex problems for training advanced LLMs. It introduces a novel methodology to systematically generate Olympiad-level mathematical problems by modeling the rationale behind expert problem design. This approach not only enhances problem diversity and difficulty but also ensures logical consistency in problem construction, providing a scalable solution for creating robust training datasets.

PromptCoT-Mamba (Scaling Reasoning without Attention) leverages the problem generation capabilities of the PromptCoT pipeline to train PromptCoT-Mamba-7B, the first attention-free foundation model based on the Mamba-2 architecture. This model demonstrates that structured training curricula can enable attention-free models to surpass strong Transformer baselines on a wide array of competition-level math and code reasoning tasks, all while maintaining constant-memory inference without KV caching.

Together, these projects offer a powerful suite of tools, models, and datasets for researchers and developers working on the cutting edge of AI reasoning.


Highlights & Key Results

1. PromptCoT: Problem Generation & Distilled Models

  • ✨ The Missing Piece for Test-Time Scaling: A lightweight yet powerful problem generation model enabling the construction of prompt sets at any scale with sufficient quality, perfect for SFT or RL post-training.
  • 📖 A Fully Open Project: All models (generation, distilled LLMs) and datasets (generation inputs, SFT data) are open-sourced.
  • 🏆 Superior Performance of Distilled Models:
    • PromptCoT-DS-7B consistently surpasses its base model, DeepSeek-R1-Distill-Qwen-7B, with significant gains:
      • +0.9% on MATH-500 (93.7%)
      • +3.2% on AIME2024 (58.7%)
      • +9.2% on AIME2025 (49.2%)
    • PromptCoT-DS-7B (7B parameters) achieves results comparable to larger 32B models like S1-32B and LIMO-32B.
    • PromptCoT-QwQ-32B sets a new standard, outperforming other 32B models by a significant margin:
      • MATH-500: 96.7% ± 0.5%
      • AIME2024: 83.8% ± 2.8%
      • AIME2025: 75.4% ± 4.7%
    • PromptCoT-DS-1.5B demonstrates competitive performance against RL-based models purely through distillation.
  • ⚡ Efficiency Without Compromise: PromptCoT-DS-1.5B achieves 40+% AIME scores using over 15× fewer A100 GPU hours compared to models like DeepScaleR-1.5B-Preview.

2. PromptCoT-Mamba: Attention-Free Reasoning

  • 🚀 First Attention-Free SOTA: PromptCoT-Mamba-7B is the first attention-free model (Mamba-2 architecture) to outperform strong Transformer baselines in math and code reasoning.
  • 🧠 Trained with PromptCoT Pipeline: Utilizes a structured, two-stage curriculum with data generated by PromptCoT.
  • 💪 Strong General Performance: PromptCoT-Mamba-7B consistently outperforms 7B-scale Transformer and hybrid Mamba-Transformer baselines.
    • MATH-500: 84.6%
    • AIME 2024: 35.2%
    • AIME 2025: 24.6%
    • Livecodebench: 29.9%
  • 🎯 Math Specialization: The math-specialized variant, PromptCoT-Mamba-Math-7B, further boosts math performance:
    • MATH-500: 88.0%
    • AIME 2024: 42.9% (+7.7% over generalist)
    • AIME 2025: 30.8% (+6.2% over generalist)
  • Inference Efficiency: Offers substantial speedups (e.g., 3.66× faster on 24GB GPU for long sequences) and constant-memory inference, ideal for cost-sensitive or long-context workloads.

Performance Details

PromptCoT Series Performance

ModelGSM8KMATH-500AIME2024AIME2025
🔹 1.5B Models
DeepSeek-R1-Distill-Qwen-1.5B-83.9%28.9%28.1%
STILL-3-1.5B-preview-85.5%39.3%-
DeepScaleR-1.5B-Preview-🟢 87.8%🟢 43.1%🟢 37.1%
PromptCoT-DS-1.5B (ours)🟢 87.6% ± 0.5%85.3% ± 1.1%41.2% ± 6.9%36.7% ± 6.2%
🔹 7B Models
DeepSeek-R1-Distill-Qwen-7B-92.8%55.5%40.0%
Qwen2.5-7B-SimpleRL-82.4%26.7%-
OpenThinker-7B-89.6%30.0%33.3%
OpenR1-Qwen-7B-90.6%36.7%40.0%
PromptCoT-DS-7B (ours)🔥 92.8% ± 0.5%🔥 93.7% ± 0.7%🔥 58.7% ± 3.1%🔥 49.2% ± 7.9%
🔹 32B Models
DeepSeek-R1-Distill-Qwen-32B-94.3%72.6%-
S1-32B-93.0%56.7%26.6%
LIMO-32B-94.8%57.1%46.6%
QwQ-32B--82.1%70.8%
PromptCoT-QwQ-32B (ours)🔥🔥 96.4% ± 0.2%🔥🔥 96.7% ± 0.5%🔥🔥 83.8% ± 2.8%🔥🔥 75.4% ± 4.7%

PromptCoT-Mamba Performance

General Performance:

ModelMATH-500AIME 24AIME 25OlympiadBenchHumanEvalHumanEval+Livecodebench
PromptCoT-Mamba-7B84.6🔥🔥35.2🔥🔥24.650.781.775.0🔥🔥29.9
Gemma3-27B89.032.624.054.286.078.026.9
Gemma3-12B83.822.919.249.981.173.222.2
Sky-T1-7B85.019.219.249.241.537.218.3
S1.1-7B82.019.217.543.164.056.713.3
Bespoke-Stratos-7B81.218.316.345.073.268.38.6
Nemotron-H-8B77.6------79.374.4--
M1-3B81.723.022.043.6------

Math Specialization vs. Generalist:

ModelMATH-500AIME 24AIME 25OlympiadBenchHumanEvalHumanEval+Livecodebench
PromptCoT-Mamba-Math-7B🔥🔥88.0🔥🔥42.9🔥🔥30.8🔥🔥52.171.366.520.3
PromptCoT-Mamba-7B84.635.224.650.781.775.029.9

Citation

If you find PromptCoT or PromptCoT-Mamba useful in your research, please consider citing the respective papers:

For PromptCoT:

@article{zhao2025promptcot,
author = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Kong, Lingpeng},
title = {PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models},
year = {2025},
journal = {arXiv preprint arXiv:2503.02324},
url = {http://arxiv.org/abs/2503.02324}
}

For PromptCoT-Mamba:

@article{zhao2025scaling,
author = {Xueliang Zhao and Wei Wu and Lingpeng Kong},
title = {Scaling Reasoning without Attention},
journal = {arXiv preprint arXiv:2505.22425},
year = {2025},
url = {https://arxiv.org/abs/2505.22425}
}