Blog

The Unreasonable Effectiveness of Data

2009-03-27 00:36:35 by Martynas Jusevičius

It is nice to see Google embracing the Semantic Web, in a recent paper by their top research scientists called “The Unreasonable Effectiveness of Data”. Here is an excerpt from the summary:

So, follow the data. Choose a representation that can use unsupervised learning on unlabeled data, which is so much more plentiful than labeled data. Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail. For natural language applications, trust that human language has already evolved words for the important concepts. See how far you can go by tying together the words that are already there, rather than by inventing new concepts with clusters of words. Now go out and gather some data, and see what it can do.

Digg Digg this! del.icio.us del.icio.us!

Comments (1)

price per head

2010-01-31 20:43:48 by price per head

What upset me about that paper is not how they say “oh sure, structure is great, but look overhere: there is a goldmine in all the sand” (which is something I fully resonate with) but they phrased it as a fight, deterministic vs. statistical, trying to convince people that adding structure it not the way to go, it’s basically a global waste of research resources.

New comment






No HTML allowed.