A Comparison of Automatic Speech Recognition (ASR) Systems, part 3

May 17, 2020May 23, 2020 / TimBunce / 2 Comments

In my two previous posts I evaluated a number of Automatic Speech Recognition systems and selected Google and Speechmatics as the best fit for my needs. Here, after another long gap, I’m returning with updated results and discussion, including new excellent results from Rev.ai, 3Scribe and AssemblyAI.

Continue reading →

A Comparison of Automatic Speech Recognition (ASR) Systems, part 2

February 11, 2019May 22, 2020 / TimBunce / 3 Comments

In my previous post I evaluated a number of Automatic Speech Recognition systems. That evaluation was useful but limited in an important way: it only used a single good quality audio file with a single pair of speakers (who both happened to be males with clear North American accents). Consequently there was no evaluation of performance across a variety of accents and varying audio quality etc.

To address that limitation I’ve tested 14 ASR systems with 12 different audio files, covering a range of accents and audio quality. This post presents the results.

Continue reading →

A Comparison of Automatic Speech Recognition (ASR) Systems

May 15, 2018May 22, 2020 / TimBunce / 21 Comments

Back in March 2016 I wrote Semi-automated podcast transcription about my interest in finding ways to make archives of podcast content more accessible. Please read that post for details of my motivations and goals.

Some 11 months later, in February 2017, I wrote Comparing Transcriptions describing how I was exploring measuring transcription accuracy. That turned out to be more tricky, and interesting, than I’d expected. Please read that post for details of the methods I’m using and what the WER (word error rate) score means.

Here, after another over-long gap, I’m returning to post the current results, and start thinking about next steps. One cause of the delay has been that whenever I returned to the topic there had been significant changes in at least one of the results, most recently when Google announced their enhanced models. In the end the delay turned out to be helpful.

Continue reading →

Comparing Transcriptions

February 9, 2017May 20, 2020 / TimBunce / 19 Comments

After a pause I am working again on my semi-automated podcast transcription project. The first part involves evaluating the quality of various methods of transcription. But how?

In this post I’ll explore how I’ve been comparing transcripts to evaluate transcription services. I’ll include the results for some human-powered services. I’ll write up the results for automated services in a later post.

Continue reading →

Semi-automated podcast transcription

March 22, 2016May 6, 2018 / TimBunce / 39 Comments

The medium of podcasting continues to grow in popularity. Americans, for example, now listen to over 21 million hours of podcasts per day. Few of those podcasts have transcripts available, so the content isn’t discoverable, searchable, linkable, reusable. It’s lost.

The typical solution is to pay a commercial transcription service, which charge roughly $1/minute and claim around 98% accuracy. For a podcast producing an hour of content a week, that would add an overhead of around $250 a month. A back catalogue of a year of podcasts would cost over $3,100 to transcribe.

When I remember fragments of some story or idea that I recall hearing on a podcast, I’d like to be able to find it again. Without searchable transcripts I can’t. It’s impractical to listen to hundreds of old episodes, so the content is effectively lost.

Given the advances in automated speech recognition in recent years, I began to wonder if some kind of automated transcription system would be practical. This led on to some thinking about interesting user interfaces.

This (long) post is a record of my research and ponderings around this topic. I sketch out some goals, constraints, and a rough outline of what I’m thinking of, along with links to many tools, projects, and references to information that might help. I’ve also been updating it as I’ve come across extra information and new services.

I’m hoping someone will tell me that such a system, or parts of it, already exist so that I can contribute to those existing projects. If not then I’m interested in starting a new project – or projects – and would welcome any help. Read on if you’re interested… Continue reading →

	A Comparison of Auto… on A Comparison of Automatic Spee…
	A Comparison of Auto… on A Comparison of Automatic Spee…
	TimBunce on A Comparison of Automatic Spee…
	alexmelman on A Comparison of Automatic Spee…
	TimBunce on A Comparison of Automatic Spee…

Not this…

Listen. Reflect. Explore. Solve.

transcription

A Comparison of Automatic Speech Recognition (ASR) Systems, part 3

A Comparison of Automatic Speech Recognition (ASR) Systems, part 2

A Comparison of Automatic Speech Recognition (ASR) Systems

Comparing Transcriptions

Semi-automated podcast transcription