Friday, December 17, 2010

Booookie Crisp!

Use of the word "poop" in books since 1800.
So if you're anything like me—well, first of all, you've probably got a mild or even severe stomachache from a week of eating any and all holiday-related food. So to that I say, try some Tums.

But if you're anything like me when it comes to books, you've spent the better part of the last 12 hours nerding out. This would be because Google has rather quietly released the first of what I imagine will be many byproducts of it's multi-year book-scanning endeavor—a database that lets users search for a word or phrase to see its usage in some 5.2 million books dating back to the year 1800. There's even a nifty little website.

In the interest of full disclosure, I'll admit I haven't historically been a big fan of Google's book project. In fact, four years ago I wrote a diatribe in my college newspaper railing against some of the perceived implications (I was the opinions editor and therefore free to reserve half-page spreads for my own unsolicited ranting). Now, I stand by many of those criticisms today—some of which are the same arguments levied against e-readers—and I remain wary of linking books to one-another the way we do content online. But I have to admit that Google's latest endeavor is pretty freaking awesome, and outside the scope of what I myself would have imagined the project being used for.

Plenty of people in media have been using the database for good—the Times did an interesting analysis of the words "tuberculosis" and "consumption" (another word for tuberculosis, for those of you who haven't seen Moulin Rouge 900 times) and I've seen more than one assessment of how often the words "women" and "men" were used. Me, I searched for "marijuana" (usage soared after 1960), "fuck" (also grew in popularity after 1960, though had a good run pre-1820 as well) and "racism" (essentially didn't exist as a concept until the last 30 years). But hey, I never claimed to be mature.

Now the database only includes about 4% of the world's books—yes, there have been 129 million books printed, which would take me approximately 129 million weeks to read, or 2.5 million years—but it's still wildly fascinating, and only stands to get more informative as Google continues to scan and incorporate new texts (ignoring for the time being that there are lawsuits pending against the company for its efforts).

At the end of the day, despite my wariness of Google's book-scanning, I find myself torn between two warring nerd inclinations: that of the reader and that of the data lover (lest you forget, I religiously maintain a data-centric Twitter feed for work). Although digitizing the world's book supply stands to change some of what I consider the fundamentals of reading, perhaps it may also do some good. After all, who doesn't want to know that use of the word "poop" in books really hit its peak in 1930.

No comments:

Post a Comment