Speech Recognition: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Samuel C. Smith
imported>Samuel C. Smith
No edit summary
Line 1: Line 1:
{{subpages}}
'''Speech Recognition''' is one of the main elements of natural language processing, or computer speech technology.  Speech recognition is equivalent to taking dictation:  converting speech into comprehensible data.  This is a skill that is done seemingly without effort by humans, but requires formidable processing and algorithmic resources from computers.
'''Speech Recognition''' is one of the main elements of natural language processing, or computer speech technology.  Speech recognition is equivalent to taking dictation:  converting speech into comprehensible data.  This is a skill that is done seemingly without effort by humans, but requires formidable processing and algorithmic resources from computers.



Revision as of 08:59, 24 July 2008

This article is developed but not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Gallery [?]
 
This editable, developed Main Article is subject to a disclaimer.

Speech Recognition is one of the main elements of natural language processing, or computer speech technology. Speech recognition is equivalent to taking dictation: converting speech into comprehensible data. This is a skill that is done seemingly without effort by humans, but requires formidable processing and algorithmic resources from computers.


History of Speech Recognition

Writing systems are ancient, going back as far as the Sumerians of 6,000 years ago. The phonograph, which allowed the analog recording and playback of speech, dates to 1877. Speech recognition had to await the development of computer, however, due to multifarious problems with the recognition of speech.

First, speech is not simply spoken text--in the same way that Miles Davis playing So What can hardly be captured by a note-for-note rendition as sheet music. What humans understand as discrete words with clear boundaries are actually delivered as a continuous stream of sounds. Iwenttothestoreyesterday, rather than I went to the store yesterday. Second, there is no one-to-one correlation between the sounds and letters. In English, there are slightly more than five vowels--a, e, i, o, u, and sometimes y. There are more than twenty different vowel sounds, though, and the exact count can vary, due to another problem.

Everyone speaks a dialect of a language. The word 'water' could be pronounced 'watter', 'wadder', 'woader', 'wattah', and so on. Pronunciation is also colored by adjacent sounds, the speed at which the user is talking, and even by the user's health. Consider how pronunciation changes when a person has a cold.

Speech Recognition Now

The Future of Speech Recognition