QUOTE (mcrose @ Dec 12 2009, 01:46 PM)
Is the Dragon iPhone App the wave of the future? I was impressed with its accuracy and speed of processing.
Just wondering what the rest of you guys think....
The Dragon iPhone app is not performing any speech recognition on your iPhone. It's an interface that connects to the Dragon NaturallySpeaking transcription server by phone connection through your cell phone service provider just as if you are calling a friend to have a normal phone conversation. In other words, it dials a phone number (Nuance), transfers what you say just as if you were talking to another person, and is transcribed at the Nuance end as if you were dictating directly into Dragon NaturallySpeaking. In essence, it basically works just like an answering machine except that instead of recording what you say it's actually engaging the Dragon NaturallySpeaking transcription server and transcribing your dictation on-the-fly. The results of that recognition are then returned to you in exactly the same manner except that it's returned to you as text just as if your friend on the other end were texting you back after listening to what you said.
This type of transcription has been available for years. Various companies have set up methods whereby the same thing can be done by doctors dictating their reports by phone directly into a workflow based transcription server, after which the transcription (text) is converted into a document and sent back to the doctor as a file. The difference here is that instead of sending the transcription (text) back to you as a file, it sent back to you using the texting capabilities of your cell phone. The transcription services available to doctors are more sophisticated, as well as more complex, but the principle is basically the same. It's just making use of the capabilities of smartphones, in this case and at this time specifically the iPhone. At some point in time it will be available for most smartphones. Nevertheless, it basically works as if you were dictating remotely over the phone.
Basically, when you hit the record button, the application dials the Nuance phone number and listens to what you say. Then it sends the text back to you. Since it only takes a few milliseconds to perform the transcription and since most users aren't going to sit there and dictate for a half an hour, the processes very quick. It's also highly accurate because every time you use the application, everything that you say is used as data for improving the Acoustic Model on the server end. Everybody who uses the service contributes to the further improvement of the accuracy. This is why it's free at this point. When Nuance feels that it has acquired sufficient data and begins to offer it on a paid basis, it will be offered as a service for a specified monthly fee. Right now, Nuance is providing it to you free because you're scratching their back by providing the acoustic data that they are interested in acquiring. Since this is designed to be basically a speaker independent service, a large corpus of acoustic data is necessary in order to improve the accuracy across many types of speakers. Even when it is offered as a service for a monthly fee, the data will still be being acquired for this purpose. At this point, Nuance is simply performing a long-term Beta test and offering it to you at no charge.
Nothing special, except for the methodology and the fact that it is a public service. If you want to see how this has been used over the last few years, take a look at vendors like CustomsSpeechUSA, which provides this type of workflow and transcription service to doctors, lawyers, transcriptionists, etc. Also, Nuance is not the first to come out with this. IBM and VoxForge have been providing this type of service for a fee for little more than a year now. However, their approach is not restricted to smartphones. Anyone with a cell phone can opt into that one. The difference there is that the results are sent back your e-mail because the average cell phone can't multitask like smartphones can.
Is it the wave of the future? One of them. If you want to look down the road, the wave of the future and where the technology is moving is in the direction of the Star Trek computer. For the end-user in the next 10 or 15 years, you will carry your personal computer in your pocket just like your iPhone except that it will run on fuel cells that will last for three or four years constantly on, have the power of today's supercomputers, be always connected to the Internet via your ISP, which will also be providing your phone service, and you'll be wearing your video display via a pair of glasses, or sunglasses, that worked like the heads-up displays available in automobiles, but as sophisticated as those available in modern aircraft (i.e., military and the new Boeing 787). That is the technology of the future and speech recognition will be the major user interface for that technology. Why? Where are you going to stick your keyboard, unless you have a pair of jeans with an oversized back pocket. Even with regard to touch keyboards on your cell phone, or in this case your computer, who's going to want to use them since they will be an incredible time waster when speech is much faster and at that point will be pretty close to 100% accurate most of the time.
Technical Project Manager
We live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology. - Carl Sagan