An industrial-grade speech recognition toolkit that offers real-time processing, supports over 50 languages, and includes features like speaker diarization and emotion detection.
Discovered on GitHub via GitHub:modelscope