pageok
pageok
pageok
Zipf's law:

The linguists at Language Log have been poking fun at a BBC story suggesting that British teens have poor vocabularies and that Britain is becoming a nation of "Vicky Pollards." The main posts on the subject are here and here; an extra post is here, and for a (very partial) retraction of their original mockery (which was substantially fair, but here they go into greater theoretical detail) here.

By the way, who is Vicky Pollard? The Language Loggers suggest looking here, here, here, and here. I've only looked at the fourth of those links, but it's pretty funny.

In any event, the basic moral is that the BBC doesn't know what it's talking about. For one thing:

The Vicky character — a broad satire of the accent, dress and manners of British lumpen-teen females — is portrayed as hyper-verbal. One of the basic Vicky bits is her jabbering rapidly on automatic pilot, saying far more than she should. Yet the BBC sees her as someone who is unable to communicate due to an inadequate word stock, not someone who over-communicates with socially inappropriate content, accent, word choice and sentence structure. This is another piece of evidence that journalists these days are incapable of elementary observation and common-sense description, at least when it comes to speech and language.

For another thing, the story generated the assertion that "the top 20 words used [by British teens] . . . account for around a third of all words." Now, you're supposed to read that and imagine "um," "like," "y'know" . . . but it turns out that everyone does the same thing. Having the top 20 words account for a third of all your words is a normal distribution. (That's "normal" in the "ordinary" sense, not the "Gaussian" sense.) Take a look at Zipf's Law, and then read this lovely article about the Oxford English Corpus, where you can find the 100 commonest English "words" (where "words" basically means "lemmas," if you find that helpful).

Especially funnily, the Language Log folks analyzed a text by the professor responsible for the statistic, and found that he, too, followed the same 20/one-third law! Not that the professor is really to blame; of course, his research was badly mangled by the media.

UPDATE: A commenter quibbles with my use of the word "commonest." In the comments, I quote the Oxford English Corpus guys using the word, and also uses of the word by Byron and Jonathan Swift.

Related Posts (on one page):

  1. More about language:
  2. Zipf's law:
19 Comments
More about language:

It's a myth that Eskimos have a huge number of words for snow (i.e., many more than we have). However, Eskimologists (not the same as eschatologists) confirm that, at least among the West Greenland Inuit, there's a single word for "They were wandering about gathering up lots of stuff that smelled like dead fish."

Related Posts (on one page):

  1. More about language:
  2. Zipf's law:
41 Comments