[Erin McKean, guest-blogging, September 26, 2007 at 11:13am] Trackbacks
Guestblogging Dictionary Myths:

Part 2: The Myth of the Online Dictionary

So (as several of you have asked in the comments, with varying levels of plaintiveness) why don't dictionaries just go completely online, and include every word? There'd be none of this stupid in-or-out waffling on the part of the lexicographers; they could just muster the words in an orderly fashion and march them onto the web, break for a long lunch, and go home early.

It's no secret that I'm a big fan of this include-everything-on-the-web idea. I'm seethingly impatient for it. I want it hot, fresh, and now, and I'm grumpy that I don't have it yet.

Don't have it yet? But what about,,,,, Wiktionary, OmegaWiki ... there's no shortage of dictionaries you can see online. What there's a shortage of is true Online Dictionaries.

A dictionary online is just a print dictionary translated to the web, with little, if any, attention paid to the advantages of web delivery. A few links, a couple of different font options -- that's it. The basic arrangement, format, layout ... those remain largely unchanged. (A couple of the online dictionaries don't even allow full-text search inside definitions! So if you can't remember the word, you can't triangulate it by looking for words you think might be used in its definition.)

Everything in the dictionary-online is still seen through the lens of print, and what print needs. The web is an afterthought. Even the wiki-style dictionaries (which I am all in favor of, and I'm on the advisory board of the Wikimedia Foundation) are largely based on print ideals of organization and inclusion. (Even Wiktionary wants words to be at least a year old before they are included in the project.)

A true Online Dictionary would be created with the web in mind. And it might not look the way we think a dictionary "should" look at all!

Print dictionary layout is optimized (or possibly ossified) for print delivery. Dictionary layout has remained largely unchanged for hundreds of years: look at a page of Johnson's Dictionary, and you recognize it immediately: "That's a dictionary." But is that format, time-tested as it is, the best one for an online dictionary? I am not convinced it is. But no dictionary that I'm aware of is testing what a true online dictionary would or should or could look like.

Not only do I think the macrostructure of the dictionary will have to change online, I believe the microstructure of the entry probably will too. Do lexicographers still need to be crafting tight little knots of definitions if the pressure to explain everything in three lines or less is no longer there? Where's the sweet spot between "short, but impenetrable" and "too long for quick comprehension ... okay, now you're an encyclopedia"?

Because lexicographers' time isn't infinite, even if the web seems nearly so, they will still have to figure out the process of herding all the words into the new online dictionary. (I can see entries accreting over time as evidence of use piles up; the first embryonic uses of word barely showing, with only one or two lonely examples, and the older words becoming like huge dripping stalactites as they accumulate hundreds of examples. You could gauge the longevity of a word by the shape of its entry.)

Before we can have a real Online Dictionary we have to figure out how people will use it, what they really need and what they simply want. Then we can figure out what it will look like, how it will behave, and what it should contain.

We also need to figure out how we can fund it. How will people pay for online dictionary content, if at all? Per word micropayments? Subscriptions? A tiered subscription with basic words being free, but harder or rarer words costing more? Paying a fee through their ISP? Taking it not-for-profit and being funded by grants? Advertising? Charging people to add their own words or definitions? (I'm just kidding about that last part, but I can imagine some people wouldn't be.) Pretty much the only funding option not available for the online dictionary is putting it between hard covers and selling it for $24.95 in Barnes and Noble ... because then you have to make the online dictionary with print in mind.

The true Online Dictionary is still a myth, sadly. But every day I think about how to make it into reality.

Steve P. (mail):
I was rather impressed a few years ago when the Visual Thesaurus came out. It seemed like a nifty product, though I have doubts on how well their subscription service is selling. Perhaps a real online dictionary would include the visual/contextual elements to help relate words together?
9.26.2007 12:39pm
How about a recording for pronunciation to replace (or supplement) the current method?
9.26.2007 12:47pm
Salixquercus (mail):
I have an individual OED subscription since etymology is a hobby, but I don't like the monthly charge as I can get too busy and a month lapses. I'd prefer to pay $30 for 1,000 searches or something similar.
9.26.2007 1:43pm
Sasha Volokh (mail) (www):
Salixquercus: I don't understand, how does the OED help you study insects?
9.26.2007 2:35pm
Alex R:
A tiered subscription with basic words being free, but harder or rarer words costing more?

I know it's not what you would call an "online" dictionary, but isn't this effectively what Merriam-Webster offers now, with access to their Collegiate dictionary free, but access to the Unabridged dictionary by paid subscription?
9.26.2007 2:36pm
Salixquercus (mail):
A lot of words bug me.
9.26.2007 2:45pm
David Huberman (mail):
I'm struggling to understand the first part of Erin's post. It doesn't seem particularly difficult to envision a database framework, and a web GUI, for an online dictionary as Erin imagines could exist. Fully cross-referenced, searchable, and fully annotated entries is exactly what databases and web publishing offer.

As for the second part of Erin's post, I think it's no different than any other business that went digital. You plunge in and hope for the best, be it for-profit or not-for-profit. If you build it, they will come ... or not. But that's a perfectly normal business risk.
9.26.2007 2:49pm
David Huberman (mail):
What I wouldn't give for an edit function for comments on VC and SCOTUSBLOG. Oy vey.
9.26.2007 2:50pm
Tony Tutins (mail):
My wife can access the OED online through her employer -- she uses her ID number to access it at home. I would try to sell the service through libraries, who would let people use their card numbers from home.

I remember as a schoolkid we discovered how circular dictionary definitions could be -- defining words in terms of other words which were defined in terms of the words we were looking up in the first place. I imagine a hyperlinked dictionary would make this even more obvious.
9.26.2007 3:14pm
Daniel San:
TT: I remember as a schoolkid we discovered how circular dictionary definitions could be -- defining words in terms of other words which were defined in terms of the words we were looking up in the first place.

Wittgenstein had a lot to say about that. Pretty well established that that's the way it is and that's the way it's going to be. But yes, hyperlinking could make all sorts of things obvious. A frequent complaint about searchable databases is that it eliminates the serendipity inherent in flipping through entries. Liberal use of hyperlinks re-introduces it but in a different form.
9.26.2007 3:23pm
Sasha Volokh (mail) (www):
Tony Tutins: As Robert Browning says:

Your business is to paint the souls of men--
Man's soul, and it's a fire, smoke . . . no, it's not . . .
It's vapour done up like a new-born babe--
(In that shape when you die it leaves your mouth)
It's . . . well, what matters talking, it's the soul!
Give us no more of body than shows soul!
9.26.2007 3:24pm
Erin (mail) (www):
I'm struggling to understand the first part of Erin's post. It doesn't seem particularly difficult to envision a database framework, and a web GUI, for an online dictionary as Erin imagines could exist. Fully cross-referenced, searchable, and fully annotated entries is exactly what databases and web publishing offer.

David -- the database underpinnings aren't hard to imagine, you're right. They're pretty bog-standard (and most dictionaries are already XML-tagged for editing). The hard part, I think, is really understanding what people need from a dictionary and delivering it, instead of just presenting them with the same information they've always gotten, only now faster! With animated icons! That is not going to be worth the new-business-model risk.
9.26.2007 4:33pm
Online Dictionaries are not a myth. I've seen them.

Sorry. Couldn't resist.
9.26.2007 4:43pm
Getting people to pay for dictionary content is probably a pipe-dream at best. This information age is not about creating an economy to sell information in. The ridiculously low cost of duplication of information makes this pretty clear. It's about having ubiquitous access to information and discovering the economic results of that ubiquity. (I paraphrase mercilessly here from a recent article by Cory Doctorow that I saw somewhere or other (probably via BoingBoing).)

Ads, grants, donations, and other non-retail mechanisms will inevitably fund future repositories of information like this.

BTW, the idea noted above for a link to an audio file with the pronunciation was also my first idea when you mentioned the ways that current online dictionaries fail to export the medium.
9.26.2007 6:25pm
I like the idea of charging people some relatively small amount to submit words. New words.

To make it catch on outside a small circle, you'd need to combine it with two other great passtimes, television and gambling.

Anyone can make up a word and pay $2 and it goes in the dictionary. The first time someone on MTV or CNN uses your word, you get $1000.

It'd be a hoot.

I'm not sure if this would clarify the question of what people want from dictionaries.
9.26.2007 7:43pm
New Pseudonym (mail):
I like the audio is a good idea, but:

What do we do with a word like parker (one who parks)?

British: Pahkah.

American: Parker.

Boston: Parkah.

Just an example, but a choice must be made between presenting one standard, or dialects in the audio. And if the answer is dialects (which I think it must), which and how many to present.

Also, since we are speaking of "A New English Dictionary based upon Historical Principles", do we eliminate words (or move them to some special link) when they have not been recorded in use for some period of time? Actually such a link might provide something that is not available in print. I presume the various editions of the NED/OED have removed words, not just added to those recorded earlier. I doubt many people are going to have copies of the different editions to compare if this is true.

By the way, I don't use the online OED just because one must pay to do so.
9.26.2007 8:07pm
I tend to go to first, then American Heritage (at I very much prefer having the tersest definition possible. One problem with an online dictionary with user contributions are definitions being added before they have truly entered the lexicon.

(Sometimes I do like knowing the etymology of a word and a good site for that is lacking. It would also be cool to have a dictionary where you could set the date of definition. In other words, you could tell it to give you common definitions of words in, say, 1776.)
9.26.2007 8:18pm
New Pseudonym (mail):
Also, looking at the "inartful" response, the larger storage available online permits the inclusion of words like this in an online dictionary without the concerns of space. it could be compartmentalized as well.
9.26.2007 8:18pm
Grant Barrett (mail) (www):
access to the Unabridged dictionary by paid subscription?

Actually, you can get access to that free, if you only look at an ad.
9.26.2007 9:40pm
Del Playa (mail):
I often use online dictionaries at work and always wished they had statistics about how common a word is. For example, this word appears an average of five times every million words. You could even get detailed and break it down by time (last three months, year, five years, 50 years, 100 years) and probably even subject too. For example, I have two words that mean the same thing, and I want to see which one is more common in, say, hardware trade magazines over the past five years.
9.26.2007 10:38pm
Bill Poser (mail) (www):
Some of the more interesting work on interfaces to on-line dictionaries is being done not with traditional monolingual dictionaries but with bilingual dictionaries for endangered languages. An interesting example is Kirrkirr, used initially for the Australian aboriginal language Warlpiri. One should also mention Wordnet, which is a lexical database of English of novel structure.
9.27.2007 12:33am
Bill Poser (mail) (www):
For languages like English, the advantages of online dictionaries are not as obvious as they are for languages with certain types of complex morphology. For some such languages, notably Athabaskan languages, the best known of which is Navajo, paper dictionaries cannot be both comprehensive and usable by non-experts. At the risk of immodesty, I recommend my own paper Making Athabaskan Dictionaries Usable and Mike Maxwell's and my Morphological Interfaces to Dictionaries.
9.27.2007 12:40am
Erin (mail) (www):
Bill -- thank you, that's a good point about Wordnet! I left it out as it doesn't seem to be aimed at general users, although general users do end up consulting it, as it's freely available and has a generous license ... and thank you for linking to those papers!
9.27.2007 10:22am
David R:
Don't ignore

Roll your eyes and vote thumbs down when necessary, but don't ignore it.
9.27.2007 2:46pm