I'm delighted to say that Erin McKean will be guest-blogging here this week. Ms. McKean is the Oxford University Press's Chief Consulting Editor for American Dictionaries; the editor in chief of the New Oxford American Dictionary (2d ed.); the author or coauthor of Weird and Wonderful Words, More Weird and Wonderful Words, Totally Weird and Wonderful Words, and That's Amore; and the author of both the Dictionary Evangelist blog and the A Dress a Day blog. She reports that she'll be blogging this week about Dictionary Myths, "the things people fervently believe about dictionaries, but sadly, aren't true."
Part 1: The Myth of the Lexicographer-Judge
What's a dictionary myth? A dictionary myth is something that people believe about dictionaries (and words as they appear in dictionaries) that simply isn't true. They can be semi-harmless dictionary urban legends (like the one that holds that antidisestablishmentarianism is the longest word in the dictionary) or they can be more pernicious, such as the widespread belief that if a word isn't in "the dictionary", it's not a real word.
A little bit more on the pernicious side is the belief that lexicographers -- the folks who edit the dictionary -- are somehow on a higher plane of word usage than the common person, and that they make decisions as to what does and does not enter the hallowed ranks of dictionary-words based on some exquisite aesthetic sense, some finely-tuned Sprachgefühl, a kind of lexical perfect pitch.
This, I hasten to assure you, is flatly not true.
Lexicographers are not the word-judging equivalents of the literary critic or the music reviewer; they're not the curators of the word museum. The lexicographer is, or should be, a scientist-journalist combo. They should research what words are actually being used, how, where, when, and by whom, and then report these facts of usage to the public in a clear, timely, straightforward manner. [Don't worry, we'll discuss what happens when what the lexicographer finds is 'wrong' later this week.]
Of course, the problem with current dictionaries (and pretty much all dictionaries everywhere at all times) is that there are often more words to be reported on than there is space in the printed book. So how to decide which words make it onto the page, and which don't? Lexicographers don't cherry-pick the pretty words, or the words with the best etymologies, or the words that are used in their favorite novels: they pick the words that will be of the most use to the largest group of people. They report 'newsworthy' words, words that they think will have sticking power, words that seem serviceable and sturdy, good for the long haul. (And, let's not forget, because making dictionaries is a commercial endeavor, they also pick words that will get publicity, attract attention, and drive consumers to their product. Those words are the chrome trim on the family sedan.)
Perhaps starting out a week of guest-blogging about dictionaries by undermining my own authority is not the brightest of bright ideas, but I feel curiously compelled to do it. By removing any special glamour from my job -- by making it just a job, and not a calling -- I hope that it will be easier to talk about the underlying data (how we know what we know about words) and to talk about the possible dictionaries of the future, instead of arguing about taste (because, as we all know de gustibus non est disputandum).
Oh, and by the way, the other myth about lexicographers is that we are horrified, appalled, and indeed, quite put out when we see misspellings, nonstandard usages, slang, or informality in general. This is ridiculous -- it's like expecting doctors to faint at the sight of blood. Our usual reaction to a word we haven't seen before (especially slang!) is "ooh, interesting!" We feel the same way about "errors," too, for the most part. Every error, every place where the language system breaks down, is a chance to deduce how language works, in the same way that every neurological injury gives us hints as to how the brain ought to function. So, please, don't let the fear of making a mistake in front of the lexicographer keep you from commenting!
Here's a question that I posed to Erin McKean, and that she graciously agreed to answer in the next few days.
The general question is: How do (and should) lexicographers decide whether to include a word in the dictionary?
The concrete example, contributed by Widener lawprof Ben Barros, is offered by the words "inartful" and "inartfully." Prof. Barros and I were both shocked to learn that the two words weren't in the OED or any dictionary accessible via onelook.com.
Some 20 years ago, William Safire wrote about inartful, and said "it is not a word." But of course that's wrong: It's a word that lawyers use often, though generally without recognizing it as legalese, and that nonlawyers seem to use on occasion as well.
A Lexis search through MEGA;MEGA for inartful! and date(> 1/1/2007) uncovers over 600 uses this year alone. Searches for past uses reveal published cases or summaries of lawyers' arguments using that term dating back to 1832. A search through a database of scanned 1700s English books found references from 1751 (Edward Kimber's The Life and Adventures of Joe Thompson 244 (2d ed.)) and 1759 (Samuel Derrick's [?] A General View of the Stage 26 (1759)). Kimber used it to mean "artlessly," but Derrick used it to mean "unskillfully," which seems to be the dominant modern meaning.
For whatever it's worth, unartful and unartfully do appear in Webster's 1913 Revised Unabridged Dictionary, though Google suggests that unartfully is over 30 times less common than inartfully.
So my questions to Ms. Mckean: Any thoughts on why the word isn't in the dictionaries, how lexicographers would decide whether to include it, and what people should do in the meantime?
So yesterday Eugene asked me why the word inartful (meaning 'unskillfully') wasn't in any dictionary that he'd consulted, including the OED and all the dictionaries you can search through onelook.com.
He pointed out that this word has been used 600 times in 2007 alone, and that he'd found a cite going back to 1751, which seems like plenty of evidence of use to guarantee a seat at the grown-up table for inartful. So why has inartful gotten the go-by?
When thinking about how words enter a dictionary, the most important thing to understand is that there are many, many more words than there are places in any current dictionary. Because of this scarcity, lexicographers are driven to a kind of triage. Often, the question isn't 'how can I justify including this word?' but 'how can I justify EXCLUDING this word?'
It wasn't that lexicographers just weren't lucky enough to run into the word: we should assume that a lexicographer at one point saw and considered inartful, however briefly. How can I assume that? Well, Ben Zimmer at OUP let me know that the word shows up three times in the citations database the OED uses (Incomings) and shows up thirteen times in the Oxford English Corpus. In the natural course of things, then, a report should have been run on all the words in either of those databases that isn't already listed as a defined word, and the output of that report looked over.
So once inartful was on that list, why didn't it shoot straight through the process to definition and publication?
Well, inartful is fairly easily and superficially defined as "not artful". Often in- or un- words are not separately defined in dictionaries (again, the lack-of-space problem) because lexicographers assume that someone who knows in- and audible can put the pieces together on their own and figure out inaudible without our help. (These words, in lexicogger jargon, are often called derivatives or run-ons, because they are derived from and often run on to an entry, where they appear at the end, after all the definitions, in bold type.)
Of course, there are difficulties with this 'solution' to the space problem, and inartful is a good example of how this assumption of transparent meaning can go wrong. It seems that at the time inartful started to be used, artful meant, plainly, "Displaying or characterized by technical skill; performed or executed in accordance with the rules of art; artistic" [OED]. Artful had not yet taken on its later meaning of "Cunning, crafty, deceitful." [OED] So the in- + artful reading of inartful stopped working, along about the time that artful took on an unsavory character. It's not that the lexicographers were slacking off here (although it may seem that way). It may take several revision cycles for all of the in- and un- (and for that matter the re- and non-) words to get reviewed to make sure that their base word hasn't skewed off in a different direction than the prefixed version. Of course, inartful didn't even get the run-on treatment, as far as I can tell. If a word is rare enough, and it has no changes in its spelling or stress pattern or pronunciation when the prefix or suffix is added, even the run-on space may be judged as too valuable to waste on so marginal a word.
Another reason that inartful could have been left off the guest list of the A-Z party is that the word seems to be used mainly by lawyers. It's not anti-lawyer prejudice on the part of lexicographers (we tend to love lawyers, because lawyers tend to love dictionaries, and, more importantly, buy them ... ). But lexicographers know that legal terminology tends to be both contained in the law world and that the law world has good dictionaries of its own, which will provide adequate coverage. (Black's, especially, although I haven't been able to look up inartful in Black's. I don't know where my copy went!) Any kind of very specific jargon or restricted terminology won't show up in a general dictionary (I'm not really talking about the OED here, which does include a lot of specialist terminology) unless the lexicographer can show that the term does show up often enough in broader contexts.
Which should have been the case, actually, for inartful, since it was the subject of the Safire contretemps Eugene mentioned yesterday (he claimed it "wasn't a word"), AND then he retracted that claim ... based in part on testimony from a lawyer, Fred Shapiro. When a word is the subject of a public debate over its wordiness, that to me says it's a good candidate for inclusion in a general dictionary ... but I'm excusing myself from responsibility for following that particular debate, since it happened in 1985, when I was fourteen and not yet reading the Sunday New York Times on a regular basis (I don't think you could get home delivery of the NYT in Winston-Salem, N.C. in 1985).
(Although now that the NYT archive is open, does anyone want to help me do a survey of all the On Language columns to see what percentage of words discussed are, in fact, included in major dictionaries? I'll take volunteers in the comments, on a first-come, first-served basis ...)
Those are just two of the reasons you can't find inartful between inarch and inarticulate in your dictionary. They're not especially good reasons, but then again the reasons for most failures aren't especially good ones.
And, really, I don't think inartful is an especially rare case. I encounter a word that's not in a dictionary but probably could be nearly every single day. Grant Barrett has built an entire web site that lists words that aren't in the major dictionaries. Dictionaries are probably only the tip of the English iceberg — there might be as much as 90% of the language hiding below the waterline.
So tomorrow I think I should discuss how lexicographers could keep this kind of inartful failure from happening, both ideally and practically. If you want a sneak preview, you might want to check out the video of my talk at the TED conference, where I discuss this same topic. (Warning: sound starts immediately as the page loads.)
Part 2: The Myth of the Online Dictionary
So (as several of you have asked in the comments, with varying levels of plaintiveness) why don't dictionaries just go completely online, and include every word? There'd be none of this stupid in-or-out waffling on the part of the lexicographers; they could just muster the words in an orderly fashion and march them onto the web, break for a long lunch, and go home early.
It's no secret that I'm a big fan of this include-everything-on-the-web idea. I'm seethingly impatient for it. I want it hot, fresh, and now, and I'm grumpy that I don't have it yet.
Don't have it yet? But what about OED.com, dictionary.com, onelook.com, bartleby.com, m-w.com, Wiktionary, OmegaWiki ... there's no shortage of dictionaries you can see online. What there's a shortage of is true Online Dictionaries.
A dictionary online is just a print dictionary translated to the web, with little, if any, attention paid to the advantages of web delivery. A few links, a couple of different font options -- that's it. The basic arrangement, format, layout ... those remain largely unchanged. (A couple of the online dictionaries don't even allow full-text search inside definitions! So if you can't remember the word, you can't triangulate it by looking for words you think might be used in its definition.)
Everything in the dictionary-online is still seen through the lens of print, and what print needs. The web is an afterthought. Even the wiki-style dictionaries (which I am all in favor of, and I'm on the advisory board of the Wikimedia Foundation) are largely based on print ideals of organization and inclusion. (Even Wiktionary wants words to be at least a year old before they are included in the project.)
A true Online Dictionary would be created with the web in mind. And it might not look the way we think a dictionary "should" look at all!
Print dictionary layout is optimized (or possibly ossified) for print delivery. Dictionary layout has remained largely unchanged for hundreds of years: look at a page of Johnson's Dictionary, and you recognize it immediately: "That's a dictionary." But is that format, time-tested as it is, the best one for an online dictionary? I am not convinced it is. But no dictionary that I'm aware of is testing what a true online dictionary would or should or could look like.
Not only do I think the macrostructure of the dictionary will have to change online, I believe the microstructure of the entry probably will too. Do lexicographers still need to be crafting tight little knots of definitions if the pressure to explain everything in three lines or less is no longer there? Where's the sweet spot between "short, but impenetrable" and "too long for quick comprehension ... okay, now you're an encyclopedia"?
Because lexicographers' time isn't infinite, even if the web seems nearly so, they will still have to figure out the process of herding all the words into the new online dictionary. (I can see entries accreting over time as evidence of use piles up; the first embryonic uses of word barely showing, with only one or two lonely examples, and the older words becoming like huge dripping stalactites as they accumulate hundreds of examples. You could gauge the longevity of a word by the shape of its entry.)
Before we can have a real Online Dictionary we have to figure out how people will use it, what they really need and what they simply want. Then we can figure out what it will look like, how it will behave, and what it should contain.
We also need to figure out how we can fund it. How will people pay for online dictionary content, if at all? Per word micropayments? Subscriptions? A tiered subscription with basic words being free, but harder or rarer words costing more? Paying a fee through their ISP? Taking it not-for-profit and being funded by grants? Advertising? Charging people to add their own words or definitions? (I'm just kidding about that last part, but I can imagine some people wouldn't be.) Pretty much the only funding option not available for the online dictionary is putting it between hard covers and selling it for $24.95 in Barnes and Noble ... because then you have to make the online dictionary with print in mind.
The true Online Dictionary is still a myth, sadly. But every day I think about how to make it into reality.
has made it into the Oxford English Dictionary (June 11 additions):
Lacking artifice; unsophisticated, unrefined; wanting polish or technical skill. Later also: unsubtle, tactless.