Ming-flash-omni-Preview: A Sparse, Unified Architecture for Multimodal Perception and Generation
GITHUB 🤗 Hugging Face| 🤖 ModelScope Omnimodal Ming-omni series update! Ming-flash-omni-Preview is the first open-source omnimodal large model with a parameter scale reaching the hundred billion-Scale level. Based on Ling 2.0’s sparse MoE architecture, Ming-flash-omni-Preview has a total of 103B parameters with 9B activated. Compared to the previous version Ming-lite-omni-1.5, Ming-flash-omni-Preview has improved in both omnimodal understanding and generation capabilities. The overall performance across various modalities has reached a leading level among open-source omnimodal models, with particularly outstanding performance in controllable image generation, streaming video understanding, and speech recognition....