Speech to text is an implicative technology

Dec 8

08:24

2011

Sharad Gaikwad

Speech recognition, also known as automatic speech recognition, computer speech recognition, speech to text, or just STT, translates spoken words to text.

The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker as is the case for most desktop recognition software. Recognizing the speaker can abridge the task of translating speech. Speech recognition is an extensive solution that refers to technology that can recognize speech without being targeted at single speaker such as a call system that can identify subjective voices. Speech recognition applications include voice user interfaces such as voice dialing, call routing, demotic appliance control, search, simple data entry, preparation of structured documents, speech to text processing, and aircraft.

Since the early days of introduction of the speech to text technology there have been many applications in different fields. The technology has simplified many a tasks for professionals.

Healthcare

In the health care field, even in the wake of improving speech recognition (SR) technologies, medical transcriptionists (MTs) have not yet become outdated. The services provided may be redistributed rather than replaced. Speech recognition is implemented in front-end and back-end of the medical documentation process. Front-End SR is where the provider dictates into a speech to text engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and validation on the document. It never goes through an MT/editor. Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report. Deferred SR is being widely used in the industry presently.

Military

High-performance fighter aircraft - Substantial efforts have been devoted in the last decade to the test and assessment of speech recognition in fighter aircraft. Of particular note is the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), and a program in France installing speech recognition systems on Mirage aircraft, and also programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft, with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. Speaker independent systems are also being developed and are in testing for the F35 Lightning II (JSF) and the Alenia Aermacchi M-346 Master lead-in fighter trainer. These systems have produced word precisions in excess of 98%.

Telephony and other domains

Speech to text in the field of telephony is now conventional and in the field of computer gaming and simulation is becoming more widespread, despite the high levels of integration with word processing in general personal computing. However, speech recognition in the field of document production has not seen the expected increases in use. The improvement of mobile processor speeds made feasible the speech-enabled Symbian and Windows Mobile Smart phones. Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands. Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command), Digital Syphon (Sonic Extractor), Nuance Communications (Nuance Voice Control), Speech Technology Center, Vito Technology (VITO Voice2Go), Speereo Software (Speereo Voice Translator), and SVOX.

It is safe to say that some point in the future, Speech to text may become speech understanding. The arithmetical models that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words and revert back. Although it is a huge hike in terms of computational supremacy and software superiority, some researchers argue that speech recognition development offers the most direct line from the computers of today to true simulated brainpower. We can talk to our computers today. In 25 years, they may very well talk back and the artificial intelligence may very well be put to many implicative uses.

Article "tagged" as: