Matt Cutts Talks About A Typical Day In Spam-Fighting
The latest “Webmaster Help” video from Google is an interesting (and long) one. Google webspam king Matt Cutts talks about a day in the life of someone on the webspam team.
Here’s the set of questions he answers verbatim:
What is a day in the life of a search spam team member like? What is the evolution of decisions in terms of how they decide which aspects of the search algorithm to update? Will certain things within the algorithm never be considered for removal?
He begins by noting that the team is made up of both engineers and manual spam fighters, both of which he addresses separately.
First, he gives a rough idea of a manual spam-fighter’s day.
“Typically it’s a mix of reactive spam-fighting and proactive spam-fighting,” he says. “So reactive would mean we get a spam report or somehow we detect that someone is spamming Google. Well, we have to react to that. We have to figure out how do we make things better, and so a certain amount of every day is just making sure that the spammers don’t infest the search results, and make the search experience horrible for everyone. So that’s sort of like not hand to hand combat, but it is saying ‘yes’ or ‘no’ this is spam, or trying to find the spam that is currently ranking relatively well. And then in the process of doing that, the best spam-fighters I know are fantastic at seeing the trends, seeing the patterns in that spam, and then moving into a proactive mode.”
This would involve trying to figure out how they’re ranking so highly, the loophole they’re exploiting, and trying to fix it at the root of the problem. This could involve interacting with engineers or just identifying specific spammers.
“Engineers,” he says. “They absolutely look at the data. They absolutely look at examples of spam, but your average day is usually spent coding and doing testing of ideas. So you’ll write up an algorithm that you think will be able to stop a particular type of spam. There’s no one algorithm that will stop every single type of spam. You know, Penguin, for example, is really good at several types of spam, but it doesn’t tackle hacked sites, for example. So if you are an engineer, you might be working on, ‘How do I detect hacked sites more accurately?'”
He says they would come up with the best techniques and signals they can use, and write an algorithm that tries to catch as many hacked sites as possible while preserving safely the sites that are innocent. Then they test it, and run it across the index or run an experiment with ratings from URLs and see if things look better. Live traffic experiments, seeing what people click on, he says, help them identify what the false positives are.
On the “evolution of decisions” part of the question, Cutts says, “We’re always going back and revisiting, and saying, ‘Okay, is this algorithm still effective? Is this algorithm still necessary given this new algorithm?’ And one thing that the quality team (the knowledge team) does very well is trying to go back and ask ourselves, ‘Okay, let’s revisit our assumptions. Let’s say if we were starting from scratch, would we do it this way? What is broken, or stale, or outdated, or defunct compared to some other new way of coming up with this?’ And so we don’t just try to have a lot of different tripwires that would catch a lot of different types of spam, you try to come up with elegant ways that will always catch spam, and try to highlight new types of spam as they occur.”
He goes on for about another three minutes after that.
Image via YouTube