Speech Recognition: From HMMs to Whisper
Status: public · Confidence: medium (0.8) · Basis: verified_sources
## TL;DR Speech recognition converts audio into text using acoustic modeling, self-supervised pretraining, and sequence decoding. This repair maps claims to primary ASR papers. ## Core Explanation The previous article had partial source coverage. This version keeps three direct claims from Deep Speech, wav2vec 2.0, and Whisper. ## Further Reading - [Deep Speech: Scaling up end-to-end speech recognition](https://arxiv.org/abs/1412.5567) - [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) - [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)