Um, Tell Me What You Mean

Review: 'How We Talk: The Inner Workings of Conversation' by N.J. Enfield

Getty Images
November 18, 2017

It's like, you know, hard to speak quickly and fluently. To patter on without fillers and pauses. Without ums and ahhs. The testimony of practiced speakers—from political orators to the folks who get up at meetings of Toastmasters International—is that these dribbled bits of linguistic grouting can be scraped away from speech, if we put in the work to teach ourselves to eliminate them.

And yet, well, I kinda wonder if, like, we should be so happy to embrace what is, in essence, professional speech. I mean, to be human is to be, at best, an inspired amateur at this whole business of, um, talking. Right? We all do it, some of us do it well, but the next step up to the high gloss of rhetorical polish, with nary a flaw and nary a falter, is to do something other than talking. Rhetors seek to be angels, delivering the faultless prose of higher realms. And more power to them, I guess. Still, it’s a move away from ordinary speech. A move away from some of the purposes of talking. A move away from the human.

Imagine a child running in from the yard to ask, "Mom, can I have a cookie?" The mother pauses for 250 milliseconds or so, and then says yes or no. Now imagine a child running in from the yard to ask, "Mom, can I have some clothes to dress Fido up as a clown for the circus parade we want to put on?" And the mother pauses for 650 milliseconds or so, before saying, "Um, tell me what you mean." The old psycho-linguistic analysis—the kind of thing we had from Freud through Chomsky—would have looked to the complexity of the question and the time needed for the mother's individual brain to parse and process what the child had said. The newer work that socio-linguistics has done over the past few decades would point out that a social function is being served, too. By hesitating ever so slightly before giving an answer, the mother has signaled her hesitations about what the child has proposed.

The social functions of ums and ahhs are the subject of How We Talk: The Inner Workings of Conversation, a new book by the Australian linguistics professor N.J. Enfield. Tests done in languages scattered across the globe—from Italy to Laos, Denmark to Korea, Mexico to Papua New Guinea—suggest that human beings, on average, begin to reply to conversational queries in around 200 milliseconds.

Of course, much of the time, the answer starts with a dribbled fragment, each language's equivalent of English's "well" or "yeah" or "um." These are placeholders, in the literal sense of the word: They hold the speaker's place in the conversation while the brain sorts out how to formulate the answer. Shared speech is both unique to the speaker and shared in the social situation, with conversation "a set of powerful social and interpretive abilities of individuals in tandem with a set of features of communicative situations." And the social purposes of the conversational elements are as significant as the individual formulations.

In Enfield's analysis, human conversation across cultures is defined by a social unease that begins to develop after a break of 600 milliseconds or so. One of the primary purposes of ums and ahhs is a kindness to those around us, a fulfilling of a neighborly duty that keeps others from growing worried or disengaged from the social encounter.

Then, too, we use fillers to buffer speech, politely signaling apparent regret or seeming diffidence about refusing an invitation or disagreeing with someone. For that matter, we use fillers to signal incomprehension without breaking the social flow of conversation. Think how often you've heard or used "Huh?" or "Beg your pardon?" or any of the other common snap responses to a failure to hear or understand what someone has just said.

In How We Talk, these filler words are taken as definitively human in the sense that they serve a profound social function for human beings. Conversation happens so rapidly that we need ways to signal to one another that we are attending to the shared speech, that we understand turn-taking is happening, and that we are encouraging further conversation. Sentence by sentence, all the ums and ahs may seem disruptive and awkward. Over the course of a social encounter, however, they are actually devices for easing the way—the grease by which the give and take of conversation is allowed to happen smoothly.

Or, in Enfield's formulation, they are patches that repair flaws and hitches in conversation. Such "repair sequences," he writes, occur in informal conversations, on average, once every 84 seconds—across all human languages. In a recent Wall Street Journal essay, Enfield points to a sample of more than 23 million spoken English words in which the linguistics professor Mark Liberman found that one out of every sixty words is "um" or "uh."

Interesting consequences follow from this social analysis of filler words. The first is that human languages seem to have the capacity to convey in conversation meta-data about the conversation itself. Ums and ahs—like mmms intended to signal that the listener is engaged in the conversation without interrupting the speaker—carry information about the flow of the conversation, not the topic being discussed in the conversation. With filler words acting as traffic controls on conversation, the time they use proves trivial when compared with the time they save.

Even more, ums and ahs act to maintain the orderly community that is a conversation. And here we reach the second way in which N.J. Enfield believes that filler words are profoundly human. How We Talk argues that research shows little to no such precise social functions in the vocal or gestural communications of animals. We find these traffic controls and meta-conversational elements solely in language—which is to say, we find them in all human languages and not in any nonhuman animal communications.

Aristotle began his political theory with the definition of man as, by nature, a political animal. And the root of the political is the polis, the city—the social interactions of human beings living among one another. Conversation is the central activity that turns the human animal into the political animal. We speak, and so are qualitatively different from other animals.

Most of us wouldn't have thought to go to ums and ahs, huhs and mmms, to prove the point. The trainers of orators and toastmasters—our parents and schoolteachers, for that matter—taught us that all those filler words were failures of the speech that make us human. But N.J. Enfield's research and analysis in How We Talk will strike readers as persuasive. It's like, you know, hard to speak quickly and fluently. But, well, um, maybe, we should learn just to go with the flow.

Published under: Book reviews