wtf is mega-asr?

xzf-thu/mega-asr — explained in plain English

Analysis updated 2026-05-18

★ 93PythonAudience · researcherComplexity · 4/5Setup · hard

Why would anyone build with this?

REASON 1

Transcribe noisy real-world audio with strong background interference

REASON 2

Benchmark in-the-wild speech recognition against Whisper and Qwen3-ASR

REASON 3

Fine-tune an ASR foundation model on custom acoustic conditions

REASON 4

Research A2S-SFT and DG-WGPO training recipes

What's in the stack?

PythonPyTorchHugging Face

How it stacks up

	xzf-thu/mega-asr	agricidaniel/claude-shorts	calesthio/generative-media-skills
Stars	93	93	93
Language	Python	Python	Python
Setup difficulty	hard	moderate	easy
Complexity	4/5	3/5	2/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you spin it up?

Difficulty · hard Time to first run · 1day+

Foundation model trained on 2.6M samples, expect GPU plus Hugging Face downloads and a multi-step inference setup.

Wtf does this do

MEGA-ASR is a speech recognition model from a group at Tsinghua University aimed at transcribing audio captured in messy real-world conditions, rather than the clean studio recordings that most speech models are tested on. The README frames it as a foundation model for what the authors call in-the-wild speech recognition, meaning audio with background noise, far-field microphones, obstructions, echoes and reverberation, recording artifacts, electronic distortion, and dropped pieces of transmission. The training set is described as 2.6 million samples covering 7 atomic acoustic conditions and 54 compound scenarios where those conditions stack on top of each other. The authors report up to roughly 30 percent gains over leading open and closed source models on these harder cases. Two training techniques are named in the README: A2S-SFT for supervised fine-tuning, and a reinforcement learning step called DG-WGPO. The README does not explain what those acronyms stand for or how they work in detail, so a non-technical reader will mostly take them as the labels of the recipes used. Most of the README is a side-by-side comparison table where short audio clips are transcribed by MEGA-ASR and by other systems, including Qwen3-ASR, Gemini-3-Pro, Seed-ASR, and Whisper. Each row shows the ground-truth text, each model's transcription, and a Word Error Rate score. In the examples shown, MEGA-ASR produces lower error rates on the hard clips while the other systems often return empty output, hallucinate unrelated text, or drop large portions of the sentence. The project links out to a technical report on arXiv, the Voices-in-the-Wild-2M training dataset on Hugging Face, the model weights on Hugging Face, a separate benchmark repository called Voices-in-the-Wild-Bench, and a project page. The README in this repository is mostly the marketing-style introduction and the comparison samples.

Yoink these prompts

Prompt 1

Show me how to load the Mega-ASR weights from Hugging Face and transcribe a noisy wav file

Prompt 2

Compare Mega-ASR and Whisper on three hard clips and compute Word Error Rate for each

Prompt 3

Summarize what the A2S-SFT and DG-WGPO training steps in Mega-ASR are doing at a high level

Prompt 4

Build a small benchmark script around Voices-in-the-Wild-Bench to evaluate my own ASR model

Frequently asked questions

wtf is mega-asr?

Tsinghua speech recognition foundation model tuned for noisy, far-field, in-the-wild audio, claiming up to 30 percent lower WER than Whisper and Qwen3-ASR on hard clips.

What language is mega-asr written in?

Mainly Python. The stack also includes Python, PyTorch, Hugging Face.

How hard is mega-asr to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is mega-asr for?

Mainly researcher.

View the repo → Decode another repo

This repo across BitVibe Labs

Don't trust strangers blindly. Verify against the repo.

wtf is mega-asr?

Why would anyone build with this?

What's in the stack?

How it stacks up

How do you spin it up?

Wtf does this do

Yoink these prompts

Frequently asked questions

wtf is mega-asr?

What language is mega-asr written in?

How hard is mega-asr to set up?

Who is mega-asr for?

Other repos in this lane