Monday, September 15, 2008

Slurring our words

Today I did my first experiment with Windows Vista voice recognition.  This post is being created by voice not by typing.  Dictation by voice is not the same as ordinary speech.  First there are commands that are part of the punctuation and error checking that one would not say while talking in a normal conversation.  More importantly, there are errors that need correction.  The errors slow you down.  So tracking the errors makes this other than regular talking.  Vista does not like the way I say the word “errors”.  It is a bit frustrating learning how to speak this way.  And focusing on how speech recognition works makes it harder to focus on the subject matter at hand.

The fantasy is to do a voice recording and have speech recognition make a transcript of the voice recording.  I don’t think the vista software is good enough to deliver on that fantasy.  But it is pretty impressive how well the dictation works for itself.  I believe I could get used to this with some practice.  The real question is whether I can stick with it to get enough practice in while I am still not very good.  That is an open question.

I have on order dragon naturally speaking to see how it compares with vista and to see if it can fulfill on the fantasy.  In the meantime I hope to produce some more documents with voice recognition.  We need to understand the strengths and limits of this technology.

I was overly optimistic and when first dictating had audacity record the audio at the same time I was speaking.  I became very self conscious worrying about the recording not the dictation.  One needs to be relaxed but articulate to do these recordings.  I need more practice before I record or do as well as with the speech recognition running.  And I’m cheating here are by occasionally going to the keyboard and using that instead of speech.  So this is far from useful yet.  But I think it’s still worth trying.

I first tried dragon naturally speaking on a laptop that had a 300 MHZ CPU.  The computer wasn’t powerful enough to run the software.  But now there’s been a tenfold increase in CPU Power.  So I would hope the software can run well.  It is actually quite impressive to watch the text being generated from the voice.  However, the reliability is still not good enough to use as a true substitute for typing.  So this is likely to remain a curiosity for quite awhile.

In the absence of fulfilling the fantasy, we need other alternatives to make our multimedia content accessible.  And the simplest one is to have Manual transcription of an audio track.  That will have to suffice for now. 

No comments: