Simple Algorithm for Google’s Rankings
Perhaps Google’s algorithm isn’t as difficult as we all think?
No, I haven’t been sitting in front of the microwave for too long again. Before you rip me to pieces, give me a few seconds to explain myself!
Possible Technology Limitations
Now, we all know that Google has one of the largest server farms in the world, estimated upwards of 250,000 individual servers spread worldwide. In spite of this fact, many people lost sight of the fact that Google only has a finite (albeit large) amount or resources.
If we estimate that Google crawls 100 million+ new pages per day, they are likely to encounter a billion or more new links on a daily basis. I think it is plausible that given the ‘100 factors’ supposedly composing the algorithm, Google may find itself running short on server power while crunching all the incoming data. For example, many of the ‘factors’ which are assumed to influence an outgoing link’s value are dependent on characteristics of incoming links. This could continue recursively back through many layers of the page heirarchy. Links are only one example of hard-to-crunch data; undoubtedly there are more costly factors to take into account.
Additionally, one needs to consider latency times to transmit data between server farms located on all different continents. For instance, data transmitted from Eastern Asia would take likely 100ms to reach the Continental US. Since page information is likely distributed among the various server farms, there could be signifigant transport delays involved in obtaining the data for a larger algorithm.
Remember that a certain proportion of Google’s server farm is not dedicated to their ranking algorithm; much of their hardware contains the finalized results which they serve out. Not only that, much of the hardware contains duplicate information: for instance, there are numerous data centers serving out identical information to search requests in the United States; a similar situation is seen in most foreign countries.
Geniuses and ‘Good’ Algorithms
Cringely’s recent article on PBS once again brought to the forefront one important fact: Google is composed of genius engineers and computer scientists. Every computer scientist knows that the ‘best’ algorithms are the ones that solve the largest number of potential cases in the least amount of steps, in the simplest fashion possible.
A well designed algorithm conveys a sense of beauty to a computer scientist; there is nothing like taking a huge, ugly algorithm written quickly to solve a problem, and refining it into a short, effective, and quick piece of work. A simple but effective algorithm has an elegance around it that is recognized by all who work with it.
As a result of the makeup of Google’s employee body, I would suspect work is constantly being done to simplify the Google algorithm while maintaining the same level of effectiveness it currently has, and I believe it is quite possible that the algorithm that is currently in place is much simpler than we have been led to believe. There is financial benefit to using a ’simple’ algorithm: by cutting down on machine time, Google would be able to get better use out of its machine time, which has obvious financial implications.
What are your thoughts? Personally, this is just a theory: until we know better, I am just going to continue with my mental picture of the ‘big’ algorithm, and all the various on-page and off-page factors we traditionally assume they look at.