Speech Recognition

Quick. What's one of the greatest resources you and I take for granted hundreds of times every day?

Want a hint?

It's our voice.

Humans evolved to use language because basic sounds and non-verbal cues only took us so far. We've become so much more with language than we ever could have been without it.

However, our tendency to value verbal communication over other forms might be limiting our evolution. And technology may be amplifying the factors preventing us from evolving into better humans.

Specifically, I'm talking about speech recognition (or “SR”).

It goes by other names like voice recognition but this is the thing used in technology like:

  • Universal or automatic translation devices
  • Vehicle recognition systems
  • Video games like Microsoft's Kinect
  • Interacting with your computer, phone or tablet
  • Military planes
  • Transcription

It's also a tool for the disabled to interact with the world in ways that would otherwise be impossible.

Seriously useful stuff, right?

So how can something so great have such a dark side?

What's at Stake

“Striving toward silence…(is) about spending as much time as possible in environments that don't necessitate a deadening of the senses. Yet, it's also largely been an attempt to shut off the mental chatter, to forget putting words to anything altogether for a few minutes.” – Mark Sisson

Imagine for a moment you're on a bus, train, subway or airplane. Only it's not the orderly scene you picture in your mind.

There is a chorus of voices and every single person is talking at once. Are they talking to each other? Giving instructions to one another?


They are using speech recognition on their laptops, tablets and smart phones to dictate a paper for school, send a message to their friend, “write” code for a project, or just because they can.

So you put on your noise cancelling headphones because you can't possibly focus on what you're doing with all those voices. What other choice do you have to eliminate the annoying, irrelevant chatter around you?

At this point I'd ask myself:

  • Why should I need countermeasures to combat the voices of all these other people talking?
  • Why should I have to take an extreme action (intentionally disabling my sense of hearing) to avoid the noise?

Look. I'm no alarmist. I love technology and I expect an incredibly bright future for humanity. But I worry about quality of life and how adopting technology will impact it.

I see a major “be careful what you wish for” SR collision course coming.

It's not all doom and gloom. Siri for your iPhone, Dragon Naturally Speaking for your laptop, Google's native SR for your android tablet…all awesome stuff.

We've come a long way in sixty years from just being able to record the numbers zero through nine. But there's a lurking threat and I believe we need to focus on it.

Some of the Big Worries

Speech Recognition

What's the difference between asking a person next to you for information and asking your phone?

For starters, when you talk to a person you're more aware of other people around you, the volume of your voice, how appropriate your language is, and most social norms.

Talking to a phone disables the normal regulation of these things. You are also much more likely to say or ask something of your phone than you would that stranger next to you.

In addition, most current SR products are created and tested by people with North American accents. Ever hear someone without one trying to get the technology to interpret their words properly?

Not Safe for Work

Obviously this will improve over time. The assumption that everybody in America speaks with an American accent will fade away. But this only partially solves the problems of an ever expanding English language.

Each weakness like loud background noises, dependency on computer processing power, and homonyms (e.g. to, too and two) will be solved in the future. Like I said, we've come a long way in sixty years but we still have a long way to go.

You wouldn't know that from watching this Google Mobile clip though.

It all seems so easy and benign!

The End of Whispering

And then there is my friend whispering.

As obstacles to efficiently using SR are solved, people are going to use it more and more.

And then, what purpose does whispering have if all our thoughts are on display for others to hear?

I don't know about you, but I whisper when:

  1. I don't want other people to hear me
  2. I don't want to be disruptive of my environment
  3. I want to avoid the attention that naturally comes from speaking

A device needs your words to be loud enough to register them. So you better start investing in noise cancelling headphones because an already loud world is going to be a lot louder.

Enforcing the Proper Use of Speech Recognition

I've already acknowledge SR can be incredibly powerful and will grow to be more useful. The problem is that the technology has been provided at the consumer level and used in the public space before it's matured far enough.

You could train yourself to ignore all the people using speech recognition. But why not train others not to create the offending situations instead?

I won't offer up a list of “do's and don'ts” but rather I give you some rules of thumb. Here are some simple things you can do to help enforce the proper use of SR.

  • Point out to people you see in public or at the dinner table when their decision to use speech recognition is disruptive or inappropriate.
  • Acknowledge that sometimes it's still easier, faster, and better for everyone around you to control the world with something other than your mouth.
  • Learn about the growing situations where your voice can do wonderful things. The resources below will help tremendously.

SR Resources

If you're still a little foggy about SR here are some great resources to learn more.

  • How Stuff Works: This is a great place to start as they have a super article about how SR works.

Speech Recognition

  • Capturing “Ah ha” Moments by Clive Thompson: A main point he makes is that “we might talk at 200 words a minute, but we can jot (or type) notes at only 25 or 30 words a minute.” This is short, to the point, and informative.
  • Tools to blog or write with SR by Jon Morrow. I've watched the 20 minute video twice and it's an awesome illustration of using your voice to form a masterpiece. And anything you don't find there you can find at KnowBrainer.
  • Speech Technology Magazine is for people who are really interested in SR. The navigation and presentation of information is poor but the content is great.
  • Mark's Daily Apple: The earlier Mark Sisson quote comes from a great post about understanding the meaning and importance of silence. I don't know about you, but I agree with him that most of us need it frequently to function.

The Takeaway

Whether it's SR or using another new technology, my question is always this:

What impact are you having on other people when you adopt a new technology?

Seemingly harmless technology often comes with major downsides. Exploring those downsides before adopting it will go a long way to limiting the negative impact it has on others.

So accept my apologies when I get annoyed visiting your technology enabled home of the future. You know, the one where you control the smell of each room, the shade of color on the walls, and the minerals in the tap water with your voice. Hey, if you can't get close enough to manipulate the world by touching it, just yell at it instead.

Photo credit: marsmet511