Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
GITHUB 🤗 Hugging Face| 🤖 ModelScope The Introduction Video of Ming-UniAudio Audio Edit Demo Editing Tasks Video demos 🚀 Technical Highlights First unified continuous speech tokenizer for both understanding and generation tasks: MingTok-Audio is a unified continuous speech tokenizer MingTok-Audio based on a VAE framework with a causal Transformer architecture, the first continuous speech tokenizer to effectively integrate semantic and acoustic features, and enables a closed-loop system with LLMs through hierarchical feature representations, makes it suitable for both understanding and generation tasks....
