Amazon's "Statistically Improbable Phrases (SIPs)":

Book listings on now list, among other things, "Statistically Improbable Phrases (SIPs)". For instance, the listing for Bernard Malamud's "The Natural," contains these SIPs: "bassoon case, dugout steps".

Can someone enlighten me as to why Amazon provides this particular bit of information?

Brett A. Thomas (mail) (www):
They have a page explaining it. I think the theory is, "this phrase occurs a lot in this book, but not in other books, therefore it might be something really unique to this book." I'd think it'd work better for nonfiction books (especially those where the author comes up with catch phrases) than for fiction works. It also might be useful to get the gestalt of the way a book covers a particular topic (see, for example, the phrases under Guns, Germs and Steel).
7.7.2005 4:40pm
Paul Gowder (mail):
yea, it might be a good way to get out from under the publisher's marketing to what's really in the book.
7.7.2005 4:45pm
Dave Justus (mail) (www):
I think they do it because they can.
7.7.2005 5:13pm
Justin Kee (mail):
I would suspect that Amazon is using the SIPs as a heuristic to help people when searching for a particular title without the usual reference information. For example, there may be a book that you read as a child that contained a unique phrase, say "bubblegum ballons", but you do not recall the title, author or publisher. If the phrase is unique to the book or very infrequently printed, the use of a SIP will reduce the number of search iterations by potentially several orders of magnitude over a typical general to specific search. I suspect that this method generates additional sales volume for Amazon, as I have found and purchased titles using the method.

Similarly, you can search for a song title, artist and album from a snippet you happen to hear on the radio by searching song lyrics for the phrase you heard. I do not think it would work well with lyrics containing "i love you", but it would for "a walk on the wild side".
7.7.2005 5:19pm
Andrew Kvochick (mail):
You can also use it to glean the author's probable influences or, at least, other things you may want to read. For example, apparently only Dr. Mary Ruwart and St. Thomas Aquinas say "honor our neighbor".

Given other thing that Amazon does, I imagine that Mr. Justus is right: they do it because they can, and they think people might find a use for it.
7.7.2005 6:23pm
Andrew Kvochick (mail):
Aha! Just after I posted I saw the new "Books on Related Topics" link, which is apparently cross-referencing SIPs.
7.7.2005 6:28pm
Zywicki (mail):
Man, you guys are good. I see now you can even click on particular SIPs to get other books that use those same SIPs to find similar books (for The Natural, as you might expect "dugout steps" turns out to be a much more useful SIP than "bassoon case," which oddly enough, is mentioned more than twice as often as "dugout steps"...).

Thanks--this is actually kind of cool (and perhaps even useful)!
7.7.2005 6:53pm
Same reason dogs lick their balls - because they can.
7.8.2005 1:30pm
Richard Bellamy (mail):
Coming soon to a bookstore near you:

The Greatest American Novel, by Richard Bellamy. Our hero invents a combination nutrition drink/deodorant called Old Sport. He tries to market it, but no one will buy it because of his big fat mustache. Despondent, he goes to visit an old war buddy who gives him his lucky green bullet that he should use to shoot the least black boy he can find. Doing this, he finds some helicopter screws at the boys body, which he uses to save the life of one male goat.

I don't know if it'll be any good, but I'm sure it'll gather a lot of interest.
7.8.2005 2:39pm
A F:
Westlaw should do this for federal cases. But they won't, because it seems westlaw has little motivation to come up with creative ideas to make legal research easier.
7.8.2005 3:56pm
Zywicki (mail):
The SIPs for "Big Fat Mustache" are my favorite:

10 references in Catch 22 by Joseph Heller

1 reference in Traci Lords: Underneath It All by Traci Lords

Don't see those two particular books listed together very often. Naturally, I'm more curious about its contextual use in the latter volume...
7.10.2005 10:06pm
msierra (mail):
I worked at a publisher (O'Reilly) where SIPs were used as a handy way to search the web for pirated versions of their books.
While SIPs are assumed to be exclusive to a single work, there's also a potential value in collating books that share a large set of relatively improbable phrases.
E.g., you could cluster similar works of pure legal theory, and distinguish that subdomain from more popular works discussing legal issues.
7.19.2005 2:45pm