April 21Apr 21 Hi all,I’m interested in learning from people using Whisper for batch subtitle generation.My main challenge is reducing manual cleanup afterward. In particular, I’m looking for advice on handling:repeated tiny subtitle fragmentsmusic being transcribed as false subtitlessubtitles starting too earlysubtitles staying on screen too longFor those doing batch processing, what works best to clean this up efficiently? Are you using audio preprocessing, VAD, post-processing rules, or specific subtitle timing adjustments?I’d love to hear what has worked well in real-life workflows.Thanks.
May 3May 3 Author Hi all,Just to add some context to my earlier question — I’ve been experimenting with Whisper in a real-world scenario (travel video with ambient sound, speech, and background music):https://www.youtube.com/watch?v=Cq2P8kFgOSIThe Thai subtitles in this video were generated fully automatically using Whisper. The English, German, French, and Dutch subtitles are translations based on the Whisper-detected Thai text.While the raw transcription quality is quite good, I’m still running into several practical issues when doing batch processing:FragmentationMany very short subtitle segments that ideally should be merged into more natural sentences.Music hallucinationsBackground music is sometimes incorrectly transcribed as speech.Timing driftSubtitles appearing slightly too earlySubtitles remaining on screen too long after speech endsWhat I’ve tried so far:Basic post-processing (merging segments based on short gaps)Filtering very short segments (e.g. < 0.5s)Forcing the language instead of auto-detect (improves consistency)What I haven’t fully solved yet:Reliable removal of non-speech (music) without losing quiet speechMore natural subtitle timing (reading speed vs. strict alignment)A scalable batch workflow that minimizes manual cleanupI’m especially interested in:Whether people are using VAD (e.g. Silero, WebRTC) before WhisperProven heuristics for merging/splitting subtitle segmentsTools or pipelines that significantly reduce manual post-editingI’d really appreciate hearing what has worked well in practice, especially for batch workflows.Thanks!
Create an account or sign in to comment