Semi-automated podcast transcription

The medium of podcasting continues to grow in popularity. Americans, for example, now listen to over 21 million hours of podcasts per day. Few of those podcasts have transcripts available, so the content isn’t discoverable, searchable, linkable, reusable. It’s lost.

The typical solution is to pay a commercial transcription service, which charge roughly $1/minute and claim around 98% accuracy. For a podcast producing an hour of content a week, that would add an overhead of around $250 a month. A back catalogue of a year of podcasts would cost over $3,100 to transcribe.

When I remember fragments of some story or idea that I recall hearing on a podcast, I’d like to be able to find it again. Without searchable transcripts I can’t. It’s impractical to listen to hundreds of old episodes, so the content is effectively lost.

Given the advances in automated speech recognition in recent years, I began to wonder if some kind of automated transcription system would be practical. This led on to some thinking about interesting user interfaces.

This (long) post is a record of my research and ponderings around this topic. I sketch out some goals, constraints, and a rough outline of what I’m thinking of, along with links to many tools, projects, and references to information that might help. I’ve also been updating it as I’ve come across extra information and new services.

I’m hoping someone will tell me that such a system, or parts of it, already exist so that I can contribute to those existing projects. If not then I’m interested in starting a new project – or projects – and would welcome any help. Read on if you’re interested… Continue reading