News

  • May 30, 2025: PromptCoT-Mamba released! Introducing an attention-free foundation model for reasoning tasks.
  • Apr 11, 2025: PromptCoT-QwQ-32B model and its training data released, achieving new state-of-the-art results.
  • Mar 7, 2025: PromptCoT project launched, including the problem generation model, distilled models (PromptCoT-DS series), and associated datasets.

Overview

This repository unifies two synergistic projects aimed at advancing the frontiers of mathematical and code reasoning in Large Language Models (LLMs): PromptCoT and PromptCoT-Mamba.

PromptCoT (Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models) addresses the critical challenge of acquiring high-quality, complex problems for training advanced LLMs. It introduces a novel methodology to systematically generate Olympiad-level mathematical problems by modeling the rationale behind expert problem design. This approach not only enhances problem diversity and difficulty but also ensures logical consistency in problem construction, providing a scalable solution for creating robust training datasets.

PromptCoT-Mamba (Scaling Reasoning without Attention) leverages the problem generation capabilities of the PromptCoT pipeline to train PromptCoT-Mamba-7B, the first attention-free foundation model based on the Mamba-2 architecture. This model demonstrates that structured training curricula can enable attention-free models to surpass strong Transformer baselines on a wide array of competition-level math and code reasoning tasks, all while maintaining constant-memory inference without KV caching.

Together, these projects offer a powerful suite of tools, models, and datasets for researchers and developers working on the cutting edge of AI reasoning.


Highlights & Key Results

1. PromptCoT: Problem Generation & Distilled Models

  • โœจ The Missing Piece for Test-Time Scaling: A lightweight yet powerful problem generation model enabling the construction of prompt sets at any scale with sufficient quality, perfect for SFT or RL post-training.
  • ๐Ÿ“– A Fully Open Project: All models (generation, distilled LLMs) and datasets (generation inputs, SFT data) are open-sourced.
  • ๐Ÿ† Superior Performance of Distilled Models:
    • PromptCoT-DS-7B consistently surpasses its base model, DeepSeek-R1-Distill-Qwen-7B, with significant gains:
      • +0.9% on MATH-500 (93.7%)
      • +3.2% on AIME2024 (58.7%)
      • +9.2% on AIME2025 (49.2%)
    • PromptCoT-DS-7B (7B parameters) achieves results comparable to larger 32B models like S1-32B and LIMO-32B.
    • PromptCoT-QwQ-32B sets a new standard, outperforming other 32B models by a significant margin:
      • MATH-500: 96.7% ยฑ 0.5%
      • AIME2024: 83.8% ยฑ 2.8%
      • AIME2025: 75.4% ยฑ 4.7%
    • PromptCoT-DS-1.5B demonstrates competitive performance against RL-based models purely through distillation.
  • โšก Efficiency Without Compromise: PromptCoT-DS-1.5B achieves 40+% AIME scores using over 15ร— fewer A100 GPU hours compared to models like DeepScaleR-1.5B-Preview.

2. PromptCoT-Mamba: Attention-Free Reasoning

  • ๐Ÿš€ First Attention-Free SOTA: PromptCoT-Mamba-7B is the first attention-free model (Mamba-2 architecture) to outperform strong Transformer baselines in math and code reasoning.
  • ๐Ÿง  Trained with PromptCoT Pipeline: Utilizes a structured, two-stage curriculum with data generated by PromptCoT.
  • ๐Ÿ’ช Strong General Performance: PromptCoT-Mamba-7B consistently outperforms 7B-scale Transformer and hybrid Mamba-Transformer baselines.
    • MATH-500: 84.6%
    • AIME 2024: 35.2%
    • AIME 2025: 24.6%
    • Livecodebench: 29.9%
  • ๐ŸŽฏ Math Specialization: The math-specialized variant, PromptCoT-Mamba-Math-7B, further boosts math performance:
    • MATH-500: 88.0%
    • AIME 2024: 42.9% (+7.7% over generalist)
    • AIME 2025: 30.8% (+6.2% over generalist)
  • โšก Inference Efficiency: Offers substantial speedups (e.g., 3.66ร— faster on 24GB GPU for long sequences) and constant-memory inference, ideal for cost-sensitive or long-context workloads.

Performance Details

PromptCoT Series Performance

ModelGSM8KMATH-500AIME2024AIME2025
๐Ÿ”น 1.5B Models
DeepSeek-R1-Distill-Qwen-1.5B-83.9%28.9%28.1%
STILL-3-1.5B-preview-85.5%39.3%-
DeepScaleR-1.5B-Preview-๐ŸŸข 87.8%๐ŸŸข 43.1%๐ŸŸข 37.1%
PromptCoT-DS-1.5B (ours)๐ŸŸข 87.6% ยฑ 0.5%85.3% ยฑ 1.1%41.2% ยฑ 6.9%36.7% ยฑ 6.2%
๐Ÿ”น 7B Models
DeepSeek-R1-Distill-Qwen-7B-92.8%55.5%40.0%
Qwen2.5-7B-SimpleRL-82.4%26.7%-
OpenThinker-7B-89.6%30.0%33.3%
OpenR1-Qwen-7B-90.6%36.7%40.0%
PromptCoT-DS-7B (ours)๐Ÿ”ฅ 92.8% ยฑ 0.5%๐Ÿ”ฅ 93.7% ยฑ 0.7%๐Ÿ”ฅ 58.7% ยฑ 3.1%๐Ÿ”ฅ 49.2% ยฑ 7.9%
๐Ÿ”น 32B Models
DeepSeek-R1-Distill-Qwen-32B-94.3%72.6%-
S1-32B-93.0%56.7%26.6%
LIMO-32B-94.8%57.1%46.6%
QwQ-32B--82.1%70.8%
PromptCoT-QwQ-32B (ours)๐Ÿ”ฅ๐Ÿ”ฅ 96.4% ยฑ 0.2%๐Ÿ”ฅ๐Ÿ”ฅ 96.7% ยฑ 0.5%๐Ÿ”ฅ๐Ÿ”ฅ 83.8% ยฑ 2.8%๐Ÿ”ฅ๐Ÿ”ฅ 75.4% ยฑ 4.7%

PromptCoT-Mamba Performance

General Performance:

ModelMATH-500AIME 24AIME 25OlympiadBenchHumanEvalHumanEval+Livecodebench
PromptCoT-Mamba-7B84.6๐Ÿ”ฅ๐Ÿ”ฅ35.2๐Ÿ”ฅ๐Ÿ”ฅ24.650.781.775.0๐Ÿ”ฅ๐Ÿ”ฅ29.9
Gemma3-27B89.032.624.054.286.078.026.9
Gemma3-12B83.822.919.249.981.173.222.2
Sky-T1-7B85.019.219.249.241.537.218.3
S1.1-7B82.019.217.543.164.056.713.3
Bespoke-Stratos-7B81.218.316.345.073.268.38.6
Nemotron-H-8B77.679.374.4
M1-3B81.723.022.043.6

Math Specialization vs. Generalist:

ModelMATH-500AIME 24AIME 25OlympiadBenchHumanEvalHumanEval+Livecodebench
PromptCoT-Mamba-Math-7B๐Ÿ”ฅ๐Ÿ”ฅ88.0๐Ÿ”ฅ๐Ÿ”ฅ42.9๐Ÿ”ฅ๐Ÿ”ฅ30.8๐Ÿ”ฅ๐Ÿ”ฅ52.171.366.520.3
PromptCoT-Mamba-7B84.635.224.650.781.775.029.9

Citation

If you find PromptCoT or PromptCoT-Mamba useful in your research, please consider citing the respective papers:

For PromptCoT:

@article{zhao2025promptcot,
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Kong, Lingpeng},
  title     = {PromptCoT: Synthesizing Olympiad-Level Problems for Mathematical Reasoning in Large Language Models},
  year      = {2025},
  journal   = {arXiv preprint arXiv:2503.02324},
  url       = {http://arxiv.org/abs/2503.02324}
}

For PromptCoT-Mamba:

@article{zhao2025scaling,
  author    = {Xueliang Zhao and Wei Wu and Lingpeng Kong},
  title     = {Scaling Reasoning without Attention},
  journal   = {arXiv preprint arXiv:2505.22425},
  year      = {2025},
  url       = {https://arxiv.org/abs/2505.22425}
}