Blog posts

Chapter 16: High-entropy writing

High-entropy writing

PrettyPartyPony's blog post yesterday got me thinking about what we mean by "wordiness". We don't mean having "too many" words. Then we would just say "long". We mean having words that don't do much.

High-entropy writing

In 1948, Claude Shannon published "A Mathematical Theory of Information", an essay (or very short book) that's surprisingly quick and easy to read for something with such profound mathematical content. It's one of the three cornerstones of science, along with Euclid's Elements and Newton's Principia. It provided equations to measure how much information words convey. Let me repeat that, shouting this time, because the implications surely didn't sink in the first time: It provided EQUATIONS to measure HOW MUCH INFORMATION WORDS CONVEY.

These measurements turn out to be isomorphic (that's a big word, but it has a precise meaning that is precisely what I mean) to the concept of thermodynamic entropy. The exact method Shannon used to measure information per letter in English is crude, but it's probably usually within 20% of the correct answer. The important point is that, for a given text and a given reader, there is a correct answer.

The implications of being able to measure information are hard to take in without thinking about it for a few decades [1]. For writers, one implication is that the question "Is this story wordy?" has an answer. I could write a simple program that would analyze a story and say how wordy it was.

The caveat is simple, subtle, and enormous: A given text conveys a well-defined amount of information to a given reader, assuming infinite computational resources [2]. Without infinite computational resources, it depends on the algorithms you use to predict what's coming next, and there are probably an infinite number of possible algorithms. I could easily compute the information content of a story by predicting the next word of each sentence based on the previous two words. This would warn a writer if their style were cliched or vague. But it would miss all the information provided by genre expectations, our understanding of story structure and theme, psychology, and many other things critical in a story.

But you can be aware of the information content of your story without writing that program or understanding how to measure entropy. One simple way is to be aware of the information content of the words you use. Writers say to use precise words and avoid vague ones. But that's not quite right. What they really mean is, use high-entropy words. A high-entropy word is one that can't be easily predicted from what came before it. The word "fiddle" is usually unexpected, but is expected if you just said "fit as a".

Fill in the blanks:

She headed to the right, past the empty bar and the plastic display case of apple and coconut creme pies, towards a tall, lean blonde in a faded orange miner's jumpsuit who was sprawled on a chair at the end of a booth, tilting it backwards into the aisle, her arms dangling.

— some hack writer, Friends, with Occasional Magic

A breeze blew through the room, blew curtains in at one end and out the other like pale flags, twisting them up toward the frosted wedding-cake of the ceiling, and then rippled over the wine-colored rug, making a shadow on it as wind does on the sea.

— F. Scott Fitzgerald, The Great Gatsby

See how the words in the second passage are harder to predict?

High-entropy writing can simply mean putting things together that don't usually go together:

The ships hung in the sky in much the same way that bricks don't.

— Douglas Adams, The Hitchhiker's Guide to the Galaxy

An AMERICAN wearing a jungle hat with a large Peace Sign on it, wearing war paint, bends TOWARD US, reaching down TOWARD US with a large knife, preparing to scalp the dead.

— From a 1975 draft of the screenplay for Apocalypse Now by John Milius and Francis Ford Coppola

When you use a word that's true and unexpected, it's poetry. When you tell a story that's true and unexpected, it's literature [3]. So aim for the unexpected plot and the unexpected word.

Meaning-dense writing

This is taken a bit too far in modernist poetry, which has very high entropy:

dead every enourmous [sic] piece

of nonsense which itself must call

a state submicroscopic is-

compared with pitying terrible

some alive individual

— E.E. Cummings, dead every enourmous piece

The problem with measuring information content is that you would produce the most-unpredictable sequence of words by choosing words at random. Meaningless text has maximum information density.

What you want to measure is true, or, better, meaningful, information [4]. Writers often use words and tell stories that are technically low-entropy (the words aren't unexpected). But whenever they do, if it's done well, it's because they convey a lot of extra, meaningful information that isn't measured by entropy.

To convey a mood or a metaphor, you choose a host of words (and maybe even punctuation) associated with that mood. That makes that cluster of words appear to be low-entropy: They all go together, and seeing one makes you expect the others.

The sky above the port was the color of television, turned to a dead channel.

— William Gibson, Neuromancer

All the world's a stage, and all the men and women merely players;

They have their exits and their entrances;

And one man in his time plays many parts, His acts being seven ages.

— William Shakespeare, As You Like It

The yellow fog that rubs its back upon the window-panes,

The yellow smoke that rubs its muzzle on the window-panes

Licked its tongue into the corners of the evening,

Lingered upon the pools that stand in drains,

Let fall upon its back the soot that falls from chimneys,

Slipped by the terrace, made a sudden leap,

And seeing that it was a soft October night,

Curled once about the house, and fell asleep.

— T.S. Eliot, The Love Song of J. Alfred Prufrock, 1915

The fog comes

on little cat feet.

It sits looking

over harbor and city

on silent haunches

and then moves on.

— Carl Sandburg, Fog, 1916

(Wait, did Sandburg really blatantly rip off Eliot's most-famous poem? Yes. Yes he did.)

In a metaphor or a mood, the words convey more information than you see at first glance. That someone would compare the sky to a television channel, and that the world's channel is dead, tell you a lot about Gibson's world. That men and women are "merely players" conveys a philosophy. An extended metaphor doesn't just tell you the information in its sentences. It points out which parts of the two things being compared are like each other, in a way that lets you figure out the different similarities from just a few words. That is extra meaning that isn't measured by entropy (but would be by Kolmogorov complexity). It may be low-entropy, but it's meaning-dense.

Rhyme greatly decreases the entropy of the rhyming words. Knowing that you need to say something about a frog that rhymes with frog reduces the number of possible final words for this poem to a handful. Yet it's still surprising—not which word Dickinson picked, but all the things it meant when she suddenly compared public society to a ...

How dreary—to be—Somebody!

How public—like a Frog—

To tell one's name—the livelong June—

To an admiring Bog!

— Emily Dickinson, I'm Nobody! Who are You?

Sometimes you use repetition to connect parts of a story:

‘Twas the day before Hearthwarming, and a nameless horror had taken residence in Dotted’s chimney. Again.

...

‘Twas the day before Hearthwarming, and a nameless horror had taken residence in Spinning Top’s chimney.

... or to focus the reader's attention on the theme:

“It’s just that I’ve plans for Hearthwarming and—”

... “Don’t you worry about me. I’ve plans for this Hearthwarming."

... “Indeed, Your Excellency. I’ve plans for Hearthwarming.”

... “Yes. I am. Now go. I’ll keep. Don’t you worry. I’ve plans for Hearthwarming.”

... He had plans this Hearthwarming.

— GhostOfHeraclitus, A Canterlot Carol

... or to make a contrast:

Smash down the cities.

Knock the walls to pieces.

Break the factories and cathedrals, warehouses and homes

Into loose piles of stone and lumber and black burnt wood:

You are the soldiers and we command you.

Build up the cities.

Set up the walls again.

Put together once more the factories and cathedrals, warehouses and homes

Into buildings for life and labor:

You are workmen and citizens all: We command you.

— Carl Sandburg, And They Obey

That's okay. The repetition is deliberate and is itself telling you something more than the sum of what the repeated parts would say by themselves.

Predictable words are vague words

Vague words may have lots of meaning, yet convey little information because we're always expecting someone to say them.

What words do I mean? I refer you to (Samsonovic & Ascoli 2010). These gentlemen used energy-minimization (one use of thermodynamics and information theory) to find the first three principal dimensions of human language. They threw words into a ten-dimensional space, then pushed them around in a way that put similar words close together [5]. Then they contrasted the words at the different ends of each dimension, to figure out what each dimension meant.

They found, in English, French, German, and Spanish, that the first three dimensions are valence (good/bad), arousal (calm/excited), and freedom (open/closed). That means there are a whole lot of words with connotations along those dimensions, and owing to their commonality, they seldom surprise us. Read an emotional, badly-written text—a bad romance novel or a political tract will do—and you'll find a lot of words that mostly tell you that something is good or bad, exciting or boring, and freeing or constrictive. Words like "wonderful", "exciting", "loving", "courageous", "care-free", or "boring". Read a badly-written polemical or philosophy paper, and you'll find related words: "commendable", "insipid", "bourgeois", "unforgivable". These are words that express judgements. Your story might lead a reader toward a particular judgement, but stating it outright is as irritating and self-defeating as laughing at your own jokes.

Our most-sacred words, like "justice", "love", "freedom", "good", "evil", and "sacred", are these types of words. They are reifications of concepts that we've formed from thousands of more-specific cases. But by themselves, they mean little. They're only appropriate when they're inappropriate: People use the words "just" or "evil" when they can't provide a specific example of how something is just or evil.

Avoid these words. Don't describe a character as an "evil enchantress"; show her doing something evil.

Sometimes they're the right words. Most of the time, they're a sign that you're thinking abstractly rather than concretely. More on this in a later post.

It's meaningful for characters to be vague!

The flip side is, have your characters use these words to highlight their faulty thinking! Pinkie describes Zecorah as an evil enchantress to show that Pinkie is jumping to conclusions. Rainbow Dash calls things "boring" to show that she's just expressing her prejudices and isn't open to some kinds of things.

[1] 70 years later, my current field, bioinformatics, is crippled because biologists still won't read that book and don't understand that when you want to compare different methods for inferring information about a protein, there is EXACTLY ONE CORRECT WAY to do it. Which no one ever uses. Same for linguistics. Most experts don't want to develop the understanding of their field to the point where it can be automated. They get upset and defensive if you tell them that some of their questions have a single mathematically-precise answer. They would rather be high priests, with their expertise more art and poetry than science, free to indulge their whimsies without being held accountable to reality by meddling mathematicians.

[2] And assuming some more abstruse philosophical claims, such as that Quine's thesis of ontological relativism is false. Which I have coincidentally proven.

[3] When you tell a story that's false and expected, it's profitable.

[4] The best anybody has come to defining how much meaning a string of text has is to use Kolmogorov complexity. The Kolmogorov complexity of a text is the number of bits of information needed to specify a computer program that would produce that text as output. A specific random sequence still has Kolmogorov complexity equal to its length if you need to re-produce it. But you don't need to reproduce it. There's nothing special about it. The amount of meaning in a text is the amount of information (suitably compressed) that was required to produce that text, and that is small for any particular occasion on which you produce any particular random sequence.

[5] People usually do this by putting words close to each other that are often used in the same context (the same surrounding words), so that "pleasant" and "enjoy" are close together, as are "car" and "truck". This work instead took antonyms and synonyms from a thesaurus, and pushed synonyms towards each other and pulled antonyms apart from each other.

Alexei V. Samsonovic & Giorgio A. Ascoli (2010). Principal Semantic Components of Language and the Measurement of Meaning. PLoS One 5(6):e10921, June 2010.

Next Chapter: The Firefly effect Estimated time remaining: 6 Hours, 11 Minutes

Login