Skip to content
View in the app

A better way to browse. Learn more.

Thailand News and Discussion Forum | ASEANNOW

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

How do you clean up Whisper subtitles at scale?

Featured Replies

Hi all,

I’m interested in learning from people using Whisper for batch subtitle generation.

My main challenge is reducing manual cleanup afterward. In particular, I’m looking for advice on handling:

  • repeated tiny subtitle fragments

  • music being transcribed as false subtitles

  • subtitles starting too early

  • subtitles staying on screen too long

For those doing batch processing, what works best to clean this up efficiently? Are you using audio preprocessing, VAD, post-processing rules, or specific subtitle timing adjustments?

I’d love to hear what has worked well in real-life workflows.

Thanks.

  • 2 weeks later...
  • Author

Hi all,

Just to add some context to my earlier question — I’ve been experimenting with Whisper in a real-world scenario (travel video with ambient sound, speech, and background music):

https://www.youtube.com/watch?v=Cq2P8kFgOSI

The Thai subtitles in this video were generated fully automatically using Whisper. The English, German, French, and Dutch subtitles are translations based on the Whisper-detected Thai text.

While the raw transcription quality is quite good, I’m still running into several practical issues when doing batch processing:

  • Fragmentation
    Many very short subtitle segments that ideally should be merged into more natural sentences.

  • Music hallucinations
    Background music is sometimes incorrectly transcribed as speech.

  • Timing drift

    • Subtitles appearing slightly too early

    • Subtitles remaining on screen too long after speech ends


What I’ve tried so far:

  • Basic post-processing (merging segments based on short gaps)

  • Filtering very short segments (e.g. < 0.5s)

  • Forcing the language instead of auto-detect (improves consistency)


What I haven’t fully solved yet:

  • Reliable removal of non-speech (music) without losing quiet speech

  • More natural subtitle timing (reading speed vs. strict alignment)

  • A scalable batch workflow that minimizes manual cleanup


I’m especially interested in:

  • Whether people are using VAD (e.g. Silero, WebRTC) before Whisper

  • Proven heuristics for merging/splitting subtitle segments

  • Tools or pipelines that significantly reduce manual post-editing


I’d really appreciate hearing what has worked well in practice, especially for batch workflows.

Thanks!

Create an account or sign in to comment

Recently Browsing 0

  • No registered users viewing this page.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.