In my previous post I evaluated a number of Automatic Speech Recognition systems. That evaluation was useful but limited in an important way: it only used a single good quality audio file with a single pair of speakers (who both happened to be males with clear North American accents). Consequently there was no evaluation of performance across a variety of accents and varying audio quality etc.
To address that limitation I’ve tested 14 ASR systems with 12 different audio files, covering a range of accents and audio quality. This post presents the results.