This organization contains the series of open-source projects from Ant Group with dedicated efforts to work towards Artificial General Intelligence (AGI).

Introducing Ming-Lite-Omni V1.5

GITHUB 🤗 Hugging Face| 🤖 ModelScope We are excited to introduce Ming-lite-omni V1.5, a comprehensive upgrade that significantly enhances the omni-modal capabilities of the original Ming-lite-omni model (find it on 🤗Hugging Face). This new version delivers remarkable improvements across a wide range of tasks, including image and text understanding, document analysis, video comprehension, speech understanding and synthesis, as well as image generation and editing. Built on the Ling-lite-1.5 architecture, Ming-lite-omni V1....

July 21, 2025 · 15 min · 3021 words · inclusionAI, Ant Group

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

📖 Technical Report | 🤗 Hugging Face| 🤖 ModelScope Introduction We introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals....

July 11, 2025 · 5 min · 1052 words · inclusionAI, Ant Group

ABench: An Evolving Open-Source Benchmark

GITHUB 🌟 Overview ABench is an evolving open-source benchmark suite designed to rigorously evaluate and enhance Large Language Models (LLMs) on complex cross-domain tasks. By targeting current model weaknesses, ABench provides systematic challenges in high-difficulty specialized domains, including physics, actuarial science, logical reasoning, law, and psychology. 🎯 Core Objectives Address Evaluation Gaps: Design high-differentiation assessment tasks targeting underperforming question types Establish Unified Standards: Create reliable, comparable benchmarks for multi-domain LLM evaluation Expand Capability Boundaries: Drive continuous optimization of knowledge systems and reasoning mechanisms through challenging innovative problems 📊 Dataset Release Status Domain Description Status Physics 500 university/competition-level physics problems (400 static + 100 dynamic parametric variants) covering 10+ fields from classical mechanics to modern physics ✅ Released Actuary Curated actuarial exam problems covering core topics: probability statistics, financial mathematics, life/non-life insurance, actuarial models, and risk management ✅ Released Logic High-differentiation logical reasoning problems from authoritative tests (LSAT/GMAT/GRE/SBI/Chinese Civil Service Exam) 🔄 In Preparation Psychology Psychological case studies and research questions (objective/subjective) evaluating understanding of human behavior and theories 🔄 In Preparation Law Authoritative judicial exam materials covering core legal domains: criminal/civil/administrative/procedural/international law 🔄 In Preparation

July 8, 2025 · 1 min · 185 words · inclusionAI, Ant Group

AWorld: The Agent Runtime for Self-Improvement

“Self-awareness: the hardest problem isn’t solving within limits, it’s discovering the own limitations” Table of Contents News — Latest updates and announcements. Introduction — Overview and purpose of the project. Installation — Step-by-step setup instructions. Quick Start — Get started with usage examples. Architecture — Explore the multi-agent system design. Demo — See the project in action with demonstrations. Contributing — How to get involved and contribute. License — Project licensing details....

July 7, 2025 · 5 min · 895 words · inclusionAI, Ant Group

Ming-Omni: A Unified Multimodal Model for Perception and Generation

GITHUB 📑 Technical Report|📖Project Page |🤗 Hugging Face| 🤖 ModelScope Introduction Ming-lite-omni, a light version of Ming-omni, which is derived from Ling-lite and features 2.8 billion activated parameter. Ming-lite-omni is a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-lite-omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers....

June 11, 2025 · 7 min · 1379 words · inclusionAI, Ant Group