Automatic Transcription Software (1/6) – Zoom

I have previously blogged about the New Project: Evaluation of Transcription Software where I looked at various tools available that could help us create automatic transcripts for audio and video.

With the new legislation on accessibility coming into effect from September 2018, it will  (hopefully) make institutions consider accessibility more seriously and the need for providing transcripts/captions will increase. So I am hoping that this blog series would be of use to others who are looking at automated transcription tools.

An image of a script

I have got eight voice recordings from purposely selected sample of UCEM employees with different English accents reading a script of 1000 words containing paragraphs from various subject disciplines related to the built environment. I generated transcripts for these recordings using automatic transcription services and then checked the accuracy of the transcriptions. In this blog series I will share my findings of trying out different transcription tools with these recordings.

While I have been looking at tools for automatic transcriptions, my colleague Graham has been looking at tools for Webinars. Zoom, a video communication tool, that we were looking at for Webinars also provides the facility of automatic transcription service for enterprise users. We tried out with a test account and I have been checking the quality of these transcriptions.

As Zoom is a communication tool, I had to create a meeting, initiate the recording (record to Cloud) and then share desktop while activating the options sharing computer sound and optimising for full-screen video clip. Then I played each video on my computer so that they get recorded and transcribed by Zoom.

Within 15-20mins of the meeting conclusion the host receives an email from Zoom providing link to the recording and transcript. I then downloaded the transcript removing any timestamps etc and compared it against the original text read by the participants using Microsoft Word’s Review > Compare functionality followed by a manual check. One recording was dropped at this point because the quality of the recording was poor and it was not like for like comparison had this been included.

Transcript accuracy was checked using the measure Word Error Rate (WER), which was calculated using the formula:

WER = (Substitution + Deletion + Insertions) / N

where N is the total number of words in the reference transcript (Apone, Botkin, Brooks, & Goldberg, 2011).

WER is not a great way to check accuracy as it considers all words as equal.  So with those reservations the results for Zoom automatic transcription is shown in the graph. Here I have calculated the WER and one minus WER as a percentage taken as the accuracy rate.

Graph showing accuracy rate of Zoom automatic transcripton

The highest accuracy rate was recorded in a non-native speaker’s recording.  When these automatically created transcripts were presented to  subject experts with the question “are these good enough as an accessibility aid?” the unanimous decision was that they weren’t good enough. However, this could be a good first draft to work on in creating an accurate transcript.

When I get time I will blog about the other automatic transcription software we tried out.


Apone, T., Botkin, B., Brooks, M., & Goldberg, L. (2011). Caption Accuracy Metrics Project: Research into Automated Error Ranking of Real-time Captions in Live Television News Programs. Retrieved from