Google Books database changes how we analyse printed words

The vast ocean of data in Google’s digitized library of books is ready for your analysis, thanks to Google’s Book Ngram Viewer. The software provides access to a database constructed by scientists at Google, Harvard, MIT, and Encyclopedia Britannica, and contains Chinese, English, French, German, Russian, and Spanish phrases and terms culled from 4% of all books ever published on Earth.

While raw text has not been made available for fear of copyright infractions, 2 billion words from 5.2 million books are searchable for everyday folks like you and me. If you are looking to do deeper research and a more thorough analysis, go ahead and grab some files from a long list of ngram data sets for yourself.

The data allows us to look at historic text in ways never previously possible. Trace the arch of political trends, measure the growth and death of artistic movements, or follow Michael Jackson’s rise popularity during the last quarter of the 20th century. Your imagination imposes the limits. Though, the database includes book text only; not periodical content, so day-to-day trends and popular subjects in the news are not perceivable.

Via WSJ

Tags: , ,