Blog  []

Ming-Omni: A Unified Multimodal Model for Perception and Generation

GITHUB 📑 Technical Report|📖Project Page |🤗 Hugging Face| 🤖 ModelScope Introduction Ming-lite-omni, a light version of Ming-omni, which is derived from Ling-lite and features 2.8 billion activated parameter. Ming-lite-omni is a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-lite-omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers....

June 11, 2025 · 7 min · 1379 words · inclusionAI, Ant Group

Ling: A MoE LLM Provided and Open-sourced by InclusionAI

🤗 Hugging Face   |   🤖 ModelScope Introduction Ling is a MoE LLM provided and open-sourced by InclusionAI. We introduce two different sizes, which are Ling-lite and Ling-plus. Ling-lite has 16.8 billion parameters with 2.75 billion activated parameters, while Ling-plus has 290 billion parameters with 28.8 billion activated parameters. Both models demonstrate impressive performance compared to existing models in the industry. Their structure makes it easy to scale up and down and adapt to different tasks, so users can use these models for a wide range of tasks, from processing natural language to solving complex problems....

May 8, 2025 · 8 min · 1574 words · inclusionAI, Ant Group

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

GITHUB 📑 Paper|🤗 Hugging Face|🤖 ModelScope Introduction Ming-Lite-Uni is an open-source multimodal framework that includes a newly developed unified visual generator, and a native multimodal autoregressive model meant to integrate vision and language. This project offers an open-source implementation of the integrated MetaQueries and M2-omni framework, while offering the innovative multi-scale learnable tokens and multi-scale representation alignment strategy. Ming-Lite-Uni utilizes a fixed MLLM and a learnable diffusion model, allowing native multimodal AR models to execute text-to-image production and instruction-based image editing tasks, hence enhancing their functionalities beyond mere visual comprehension....

May 7, 2025 · 6 min · 1133 words · inclusionAI, Ant Group

Ming-Lite-Omni-Preview: A MoE Model Designed to Perceive a Wide Range of Modalities

GITHUB 🤗 Hugging Face | 🤖 ModelScope Introduction Ming-Lite-Omni-Preview is built upon Ling-Lite, which is a MoE model designed to perceive a wide range of modalities, including text, images, audio, and video, while generating text and natural speech in a streaming manner. To naturely handle the diverse modalities, we have enhanced Ling-Lite by incorporating modality-specific routers for each modality. As a result, Ming-Omni excels at handling information from diverse modalities and is highly scalable....

May 5, 2025 · 6 min · 1105 words · inclusionAI, Ant Group

Agentic Learning

Introduction Agent exhibits powerful capabilities by interacting with the external environment and making decisions based on the feedback it receives from the environment. For complex problems, it is often necessary for an agent to have multi-turn interactions with the environment to reach a solution. The complexity and dynamism of environments, coupled with the necessity for multi-turn interactions, pose numerous challenges in training agents. We introduce AgenticLearning, an open-source agent training paradigm designed to empower researchers to train and evaluate autonomous agents effectively....

April 1, 2025 · 3 min · 446 words · inclusionAI, Ant Group