An exclusive look inside the Google search algorithm
The March 2010 issue of Wired features “an exclusive look at the algorithm that rules the Web.” It’s a fascinating three page article that reveals much about Google’s tireless drive to improve itself, but most significant are the hints as to what hidden ‘signals’ determine what websites end up at the top of a search query – and the realization that Google’s system is ever-changing.
Web search is a multipart process. First, Google crawls the Web to collect the contents of every accessible site. This data is broken down into an index (organized by word, just like the index of a textbook), a way of finding any page based on its content. Every time a user types a query, the index is combed for relevant pages, returning a list that commonly numbers in the hundreds of thousands, or millions. The trickiest part, though, is the ranking process — determining which of those pages belong at the top of the list.
That’s where the contextual signals come in. All search engines incorporate them, but none has added as many or made use of them as skillfully as Google has. PageRank itself is a signal, an attribute of a Web page (in this case, its importance relative to the rest of the Web) that can be used to help determine relevance. Some of the signals now seem obvious.