Transcription – will it be replaced by voice recognition software?

Aug 11

10:17

2007

Anne Hickley PhD

This article indicates a number of reasons why transcription will not be replaced by voice recognition software, at least for the foreseeable future. Well, maybe as a professional transcriptionist I could be considered biased, but there are issues with voice recognition software that I don't think will be solved in the next few years.

Well maybe as a professional transcriptionist I could be considered biased, but this article indicates a number of reasons why transcription will not be replaced by voice recognition software, at least for the foreseeable future.

What is voice recognition software?

Voice recognition software (the best known example is probably Dragon Naturally Speaking by Nuance) is software that allows the operator to speak in the computer in order to give the computer commands (e.g. to save a file) or to dictate speech into the computer. Over the past ten years it has been hailed a number of times as the answer to fast and accurate dictation – but we’re still waiting. The latest version of Dragon claims to be 99% accurate and three times faster than typing, so I recently purchased a copy to see.

Dictation only?

It is important to remember, especially if you are producing interviews rather than one-person dictation, that the software will only recognise the one voice it’s trained in. If you try to record an interview into it, it will pick up your voice but not the interviewee’s. It is possible to re-speak an interview – listen to the interview recording and speak the words of both interviewer and interviewee a few words after they’re said. However, that’s an art in itself, and even a trained court stenographer would struggle to listen to an interview straight through, with no pausing, and do this, so how much time does it really save?

Why is this software not about to take over the world?

A recent seminar given by a seller of digital dictation systems and associated voice recognition the presenter indicated that voice recognition would probably only work well for three in ten people. This is because the software has to be first trained to recognise your voice (a process which really does only take a few minutes) and then continuously corrected and trained to recognise words that it’s either not familiar with or simply doesn’t recognise from your speech patterns, which is an ongoing process and does take time.

I used this software first about five years ago and I have to say it’s come on in leaps and bounds since then, but even the latest version can be a struggle! I purchased the latest version on the advice of a potential client who claimed that even re-speaking his interviews (listening to a recording and then speaking the words into the software as they were said) he could get an hour of recording down in 2.5 hours rather than the four hours per hour it usually takes to transcribe. He asked if I would be prepared to use a voice recognition program and re-speak his files for him at a rate that reflected the 2.5 hours per hour. I agreed to purchase the software as I’d been toying with the idea of doing so anyway, and do a trial tape for him. I said I wasn’t prepared to agree to 2.5 hours per hour until I’d proved it to myself, and he accepted this.

Funnily enough, several months on, I’m still waiting for that trial tape. Having experimented with the software I find that re-speaking actually takes me …guess how long – once the checking has all been done, about four hours per hour of recording!

I do use the software now and then, because on a busy week it saves the fingers and wrists, but what it doesn’t save is time! When I’m transcribing I proofread as I type. Then just a quick skim through is required afterwards to make sure no errors have sneaked through. Using the software to dictate one can probably do this, and there may be a little time saved, but one cannot do this while re-speaking because one is concentrating on listening to the words being said and repeating them a few words after, while trying to continuously play the recording.

Also because the speech recognition software is not intelligent it often doesn’t know which homonyms to use. This is something that has been vastly improved over the years, and in simple sentences it is often capable of working out whether to, two or too is required; or whether it’s here or hear. However in a longer and more complex sentence it frequently struggles with this and very careful proofreading is required afterwards.

Also, when you’re dictating you don’t really want to also have to use the keyboard, so the software allows you to include commands for making words bold, underling them etc. However, you need to leave a small pause between the command and general dictation or the software just things the command is part of the words being dictated. That also slows things up, as compared to professional transcribe using keyboard shortcut keys for these commands. And re-speaking an interview is even worse because every time there’s a change of speaker that needs to be indicated either with the name or initials and a tab, or the name and a new paragraph, so there are a number of commands involved around that.

To save time the latest version of Naturally Speaking will actually try to punctuate for you … so you dictate and it puts the commas and full stops in the right places. That’s the theory anyway. Suffice it to say that I wouldn’t recommend it!

Conclusion

All in all I’d say Dragon 9, the latest version, is a very useful tool, but it will be some time before it’s really ready to replace transcriptionists.

Article "tagged" as: