Monday, 12 November 2012

Speech recognition software and writing

Speech recognition software automatically transcribes digital text from spoken voice, with over 90% accuracy in ideal conditions. It can save time typing, and also helps to cut back on repetitive motions, lowering the risk of RSI or easing it if you have it already. Once trained to your voice the software is much faster than typing, even when you consider the time spent making corrections (many of which can be done on the fly using voice input, which further trains the software and improves its accuracy in the future).

There are two main scenarios in which I've used it:

  1. Sat at my PC, using a microphone to open and close programs, type in text etc. I can then sit there and 'write' just by speaking. Not only is it quicker, but it taps into a different type of creativity - by speaking and letting it flow I find that I am applying less conscious control than when I write normally, a chance to let surprising word combinations come out. I can also sit back, eyes closed, picture the scene and talk away, describing what I see and translating the imaginary scenes into spoken words.
  2. I've written about 'snippets'* before. These are files on my PC that store fragmented sensations, ideas, word plays, descriptions and so on. Often these occur to me on the move. In some cases I jot them in a notebook to type up later, but when walking along that's not convenient, so instead I record the thoughts as audio files on my smartphone. Then when I get home I have to type them up (requiring multiple passes, since the speaking is faster than my typing!). Using audio transcription software to turn them into text documents is much quicker.
In many cases you probably have all that you need on your PC already - see this useful BBC guide to using the tools built in to your operating system. Otherwise there are commercial and free software options (e.g. see this list of software).

Dragon NaturallySpeaking - my thoughts
Recently I have been using a version of Dragon NaturallySpeaking. It proved to be useful and fairly fast, allowing me to speak and transcribe the words as text, or to control my PC. It required extensive training to get the accuracy up, and there is a bit of a learning curve in getting used to the software (many of the tips and instructions I needed required reading third-party help guides on the Web), but after that I didn't have any problems. In fact, I was quite impressed at how far speech recognition has come since I last looked at it about seven years ago. It is really handy for when I have hand-written notes - normally I put off the job of typing them up because I know it will be so time-consuming, leading to sheets of paper piling up and making me feel bad. Once I started to use this software in earnest it was much quicker and I soon got the piles under control again by reading it all out then just correcting errors.

In terms of errors, it is usually unfamiliar words and names that the software gets confused by e.g. it interpreted the Welsh place name Eglwysfach as 'a glorious flash'. The software is incredibly prude by default, obviously built that way to avoid accidentally upsetting anyone, but when you're writing stories the prudishness can be a pain since it leads to more corrections being needed. One story I wrote had all of the 'fuck off!' statements appear as "for cough!" The software also has problems with non-standard words like 'Drrring!', and annoyingly I've often had to go and correct the punctuation (e.g. it would interpret 'quotation marks' as being single marks, not double). I don't point these things out to put anyone off, they just give a flavour of the areas where software stumbles.

<rant>One thing that did put me off the version of Dragon I bought was that when I got it home I found out that there was a requirement for annoying registration (requiring a scan of some form of identification and a delay of days). Ridiculous! Seeing as I owned a legal copy I had no qualms about downloading a crack that bypassed all the DRM and let me use my software straightaway. It shows how pointless the DRM is, since it just inconveniences customers, creates bad feeling, and doesn't prevent piracy at all.</rant>

I had intended to incorporate a screencast of me using the software, to show how it can be used by writers. Unfortunately when I upgraded my PC to Windows 7 the inbuilt microphone on my Medusa 5.1 headphones stopped working. I don't know if it is a sound card driver, Medusa, or Windows 7 problems (or a combination of all three) but until I can get it working or buy a new microphone I won't be able to use the speech recognition software for this. Foiled!

Express Scribe
I've also used the free version of this in the past. The thing I like about using Express Scribe Transcription Software (with the default Windows 'Microsoft Speech Recognizer') is that it is better than Dragon for analysing pre-recorded audio files (Dragon is best for live transcription at the PC). So this is my software of choice for audio files recorded on the move.

What's the quality of transcription like?
Most of it comes down to two factors:
  1. Having a good quality recording environment.
  2. Training the software.
With regards to the first factor, the software needs to hear clear sounds. A bad microphone or using the software in a noisy environment interferes with clarity, leading to more errors. The audio files from my phone are particularly bad for this since they're usually created while walking down a busy road and the phone even picks up the breeze blowing across the microphone as I walk briskly along! In those circumstances even the best speech recognition software will fail more often than not.

As to training, it usually involves reading out passages so that the software can learn how you pronounce words. It is ongoing, since every correction you make can also help it to understand you and become more personalised and accurate for your voice. This increases the importance of being able to back up all the settings files so that when you have to install the software again (e.g. new PC, reinstalled operating system etc) then you don't have to repeat all the time-consuming training and personalisation. I must admit that Dragon didn't make it easy to find and back up all the data, it took a bit of digging around.

Again, I'm sad that I can't do a screencast showing the tools in action. However, I will show how strange it can be in a worst case situation. Here are some audio snippets I recorded from my phone then ran through an untrained version of Microsoft's Speech Recognizer, i.e. the worst way of doing things! They amuse me in the way they resemble an alien speaking. I intend to use this method whenever I am stuck for inspiration: speak about some random topic, using a bad microphone or noisy environment, then let untrained software transcribe it. Then I can pick out words and phrases I like, things that are unexpected combinations, and see what that leads to. Here are the results of my 'bad' experiment.

File 1
What the computer transcribed: 
Providing an alias are less John including her cage and explain how the in-and it was for the their lives is determined to and fro in land line and it wasn't for him
What I actually said: 
She knows what he was like. How he was strong and confident in front of the kids when he had to explain that they had to have the dog put down and it was for the best; then as soon as he was on his own couldn’t stop crying. He was a dad. He wasn’t callous.
File 2
What the computer transcribed: 
If this two-to find it hard question every something that had happened to her if
What I actually said: 
It’s the first duty of every civilised being to question every assumption, and everything it has ever been told.
File 3
What the computer transcribed: 
The then I wasn't here point out that the Tories productivity who wanted then you can get into a a gill encountered 19 the new owners abroad negativity the ones that you can't do that ranging England make them work on their dividends and the world and added that in your own living in July he remained until his brief the inspirational quickly quashed than than led to the next read in open court international the young men's rooms and money and that the oh loading and unloading other people would need giving us and the poor quality learning can change things 81 the induction into the World Cup will be of interest in the the wanted list and enhance has rung the you know you're never alone with her way you can change the hartlepool united didn't get harder and harder to but years younger than that productivity and that more than a decade in the words
What I actually said: 
In all of us there are two voices: the voice of positivity, the one that says, “You can do it; do try that, it sounds exciting and new”. Then there’s the voice of negativity, the one that says, “You can’t do it; don’t try and change things, you’ll only make them worse.” The problem is, if you listen only to the voice of negativity you will never change your life; even when you feel these brief bursts of inspiration you’ll quickly quash them, fading back into a negativity of bitterness and resentment which eventually manifests itself as self-loathing and anger against other people.  Whereas if you listen to the voice of positivity then it can change things, make your life go in directions you never would have thought possible. Another interesting thing is that the one you listen to the most gets stronger. So although you’ll never reach a point where you can’t change even if you wanted to, it does get harder and harder, so you’ve got to strengthen that positivity, listen to it more than the negative voice.
File 4
What the computer transcribed: 
Air filled her with him on the foetal brain for the murder of couldn't have who confided in her life performers who think they have always- and- more than anything -and left us with-and if necessary --fifth of can official who claimed the life- of some the time I can autonomy for the if he does, and have had a this and legs will have- of- and there are different from the wonderfully on in the group is hoping to take the have- and is continually and has the into the in and political life has taken links between to the only important effect -and-after leading to 1003 -for example income for the needs of Hawaiian after and his contemporary items has often left for Washington confirmed the and full-lines for the super-taped it the thinking that if the a list of us to conquer next February 11 continuously and the of into loss of office with whom he from N
What I actually said: 
He walked home with the rain falling in the dark, crinkling off his hood, everything reflected in the ground, blurred. What was he thinking? What was he thinking? He wasn’t thinking of the one he’d left. He walked in black bitterness, bitterness of rain. What had he been thinking? The ground shiny orange on black tar, chemicals on black tar. He was cold, legs wet, thinking about the difference between a man and a boy. Thinking back. Regret. Regret. Red light here, discoloured street light, reflections were fairy ground lights in pink and red, fairground like a boy staring at the lights in the rain, surrounded by noise and shiny things. Every single thing that boy wanted wasn’t worth it. The candyfloss was old in the tank. The rides just made him feel ill. Afterwards there was nothing left. What was he thinking at the start? Always the lights. Lights, smells and tastes. Always thinking it would be better, and always sick to the stomach. Cold legs, wet rain. A man walking down the street, going back to an empty house. What was he thinking?

* Addendum: more on the 'snippets' concept
For these 'snippets' things (which can easily be kept print-only if you prefer) there is also the option of using a private blog as a means of organising them - one snippet per blog post. They will be automatically sorted and retrievable by date; tags you apply will make content identifiable; and a heading acts as a good description. It can be a great way to organise notes, and combines them with diary elements in some cases. You can also then embed photos etc.


  1. Interesting post.
    Nice pics!
    And, just a query, but do you think the Mancunian accent is harder for them to understand than RP? We could do a test....

  2. I don't know whats you means, our kid, innit? Shouldn't be 'ard to make it out, knoworramean?