Google has released a new section on its Inside Search site called "How Search Works," which includes an animated graphic explaining just that. It also provides a "view into major search algorithms and features," a 43-page document about the guidelines for search raters, a slideshow about spam removal, graphs about spam, and a list of policies that explain when Google will remove content.
The animation is only available in English,but there's also a text version in 43 languages.
On the site, Goole lists its projects that it "constantly tunes and refines". These are: Answers, Autocomplete, Books, Freshness, Google Instant, Images, Indexing, Knowledge Graph, Mobile, News, Query Understanding, Refinements, SafeSearch, Search Methods, Site & Page Quality, Snippets, Spelling, Synonyms, Translation and Internationalization, Universal Search, User Context, and Videos.
For "Site & Page Quality," Google says, "Uses a set of signals to determine how trustworthy, reputable, or authoritative a source is. (One of these signals is PageRank, one of Google’s first algorithms, which looks at links between pages to determine their relevance.)"
User context is defined as, "Provides more relevant results based on geographic region, Web History, and other factors."
The "Live Spam Screenshots" section is interesting. It will show you a page, for example, that was removed from search results 34 minutes ago. It's currently allowing you to look at 56 examples (all removed 33-34 minutes ago) in slideshow format. When you click "next," you're presented with a warning that says, "These screenshots are generated automatically and are not manually filtered. While uncommon, you may see offensive, sexually explicit, or violent content."
There is still a note under the slideshow, which says, "We’ve removed some pornographic content and malware from this demo, but otherwise this is an unfiltered stream of fresh English examples of 'pure spam' removals."
Google then runs through the various types of spam: cloaking and/or sneaky redirects, hacked site, hidden text and/or keyword stuffing, parked domains, pure spam, spammy free hosts and dynamic DNS providers, thin content with little or no added value, unnatural links fro a site, unnatural links to a site, and user-generated spam. The site includes this graph showing the number of domains affected by a manual action over time, broken down by different spam types:
This is followed by the listing of spam-fighting milestones, and some other interesting graphs about webmaster notifications and reconsideration requests.
Finally, the site gives you a brief overview of Google's policies on: access to information, algorithms over manual action, exceptions lists, fighting spam and malware, transparency for webmasters, preventing identity theft, legal removals, fighting child exploitation, shocking content, and SafeSearch.