Google "Panda" Algorithm Update - What's Known & What's Possible

Google’s recent algorithm update aimed at improving the quality of search results has captured a great deal of attention – both positive and negative. The general consensus seems to be that the results are in fact better now, but still not perfect. Perfection will likely never be achieved, but there are still some glaring criticisms out there about Google’s most recent attempt.

Having had some time to reflect, what is your opinion of the update? Let us know in the comments.

Despite the improvement in overall search quality in general, there have been many sites to suffer the consequences of the update – some deservedly and others maybe not so much. As Google will never reveal its secret recipe in its entirety, there are plenty of clues out there, and even facts that Google will share. You can criticize Google’s mystique all you want, but there’s not denying that they do communicate with the webmaster community to a great extent, even if they don’t always tell you everything you want to hear.

Google’s Matt Cutts and Amit Singhal – two of the most instrumental voices in the recent update – shared some clues and insights in an interview with Wired this week. Before we get to specifics, there were some interesting things mentioned by the two that are worth noting. For example, Caffeine, which sped Google’s indexing, led to a flood of content – both good and bad. This seems to have helped the “shallow” kinds of content that this most recent update targeted – not stuff that is quite spam, but…well, shallow. We also learned that Google calls the update “Panda”.

They revealed that prior to the update, they sent out documents to outside testers/raters, and asked them questions about quality. It would be interesting to know who these raters were, but no such luck there. Users were asked things like whether they would feel comfortable giving a site their credit card info or giving medicine from the site to their kids (I wonder if anyone was asked if they felt comfortable getting their brain cancer information from a freelance eHow writer with no credentials in the field), whether they considered the site to be authoritative, whether it would be ok in a magazine, whether it has “excessive” ads, and other questions. It would be great to be able to know more of those questions, but we can only work with what Google has revealed.

“And based on that, we basically formed some definition of what could be considered low quality,” Singhal is quoted as saying.

“We actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side,” said Cutts. “And you can really see mathematical reasons…”

“I got an e-mail from someone who wrote out of the blue and said, ‘Hey, a couple months ago, I was worried that my daughter had pediatric multiple sclerosis, and the content farms were ranking above government sites,'” Cutts is later quoted as saying. “Now, she said, the government sites are ranking higher. So I just wanted to write and say thank you.'”

Again, why is eHow still ranking for “level 4 brain cancer”?

Google says it still looks at feedback, and Cutts even said that if someone has a specific question about why a site dropped, he thinks it’s “fair and justifiable and defensible to tell them why that site dropped.” He also said that Google’s most recent algorithm contains signals that can be gamed (hence the lack of full transperency). In other words, it can still be optimized for.

Finally, the site Suite101, which data from SearchMetrics lists as the biggest loser in percentage (in its organic performance index) was brought up in the interview. Suite101 and eHow are often compared and labeled as “content farm” type sites. When asked why Suite101 took a much bigger hit than eHow, Cutts simply said, “I feel pretty confident about the algorithm on Suite 101.”

It would be very helpful to understand the differences Google sees between these two sites. It doesn’t seem very clear by looking through the sites that there are obvious differences in quality. I’m sure it varies on both.

We reached out to Suite101 a few days ago for comment on the update and its impact, but have yet to receive a response. I’m even more interested to hear what they have to say, now that these comments have come out. Update: Suite101 referred us to an open letter from CEO Peter Berger to Google’s Matt Cutts.

CEO Peter Berger stressed the importance of quality in content when we spoke with him last year.

“Every week, several thousand people apply to become Suite101 writers,” he told us. “While we only accept a portion of applicants based on our non-negotiable quality standards, we do have many successful writers on our site who do not consider themselves ‘writers’.”

“We see it as Suite101’s mission to enable people – anyone who can write well and with deep understanding of a subject – to achieve their goals,” he said. “These might be earning money, addressing large audiences, building up a personal professional brand, or simply enjoying creative freedom in a nurturing, peer-oriented environment.”

Results from people with a deep understanding of a subject should lend themselves to quality. Whether or not Suite101 delivers on this is open for debate. Clearly Google doesn’t think so, practically making the site the poster-child of what not to do. The mysteries continue…

What we know Google is looking at with the Panda update:

– User comfort level in the trust area (think credit card/medicine comments)

– Is it considered authoritative (this would apply some indication of expertise on topics covered, I would think)

– Is the content quality good enough for print? (I’ve seen plenty of crap printed)

– Are there too many ads? (How many are too many, and does the ad network matter?)

– We know Google has its definition of what could be considered low quality

– Google uses a “classifier” to draw a line in the sand

– We know that so far, Google has not used indications from the Chrome Extension (emphasis on so far. Google hinted in the past that this data could potentially be used to tweak the algorithm).

– Google looks at feedback, at least to some extent

– Based on comments from Cutts, Google will tell you why your site dropped (getting that communication flow going may not be the easiest thing to do, but I have personally witnessed Cutts sit down with someone at a conference and look at their site with them.)

– The algorithm can still be gamed. It can still be optimized for. (If you were hit by the update, there are things you can do to get back in Google’s good graces. In other words, you’re not necessarily banned just because of your brand.)

– Most of the changes in rankings will be done algorithmically, but Google will take manual action in some instances (see JC Penney)

-If you use any auto-generated content keep it separated from the original high quality stuff, and block it from search engines. Google’s John Mu said recently, “If you do have such high-quality, unique and compelling content, I’d recommend separating it from the auto-generated rest of the site, and making sure that the auto-generated part is blocked from crawling and indexing, so that search engines can focus on what makes your site unique and valuable to users world-wide.”

This is basically in line with another statement from Google obtained by Search Engine Land: “Sites that believe they have been adversely impacted by the change should be sure to extensively evaluate their site quality. In particular, it’s important to note that low quality pages on one part of a site can impact the overall ranking of that site. Publishers who believe they’ve been impacted can also post in our webmaster forums to let us know. We will consider feedback from publishers and the community as we continue to refine our algorithms.”

If you can think of anything else that is “known” about this update, please feel free to comment.

We won’t jump to any conclusions, but here are…

Some things that are possible that may be worth considering:

– Old fashioned design may play a role. Simply from the aesthetic point of view, this may make a site appear less trustworthy (less likely that consumers will be willing to give up their credit care info). We wonder if this played a role in the EzineArticles and Foner Books examples we looked at.

– There is possibly a threshold that can be crossed for what is considered too many ads before your site gets points taken off for quality. Some have tried reducing the number of ads (again, see EzineArticles) to try and boost rankings.

– Duplicate content (to some degree) may play a role in this recent update. EzineArticles, again, is a prime candidate for this. Articles from the site are published other places – probably the majority of the content from the site is duplicated at other places (besides just scrapers). eHow content is uniquely written for eHow. There are plenty of people that will suggest much of this content is rewritten based on other existing articles, but that’s beside the point. The content itself is unique to eHow (again, scrapers aside).

Other sites like Business Insider, The Huffington Post, and even the New York Times, CNN, and the Wall Street Journal will syndicate content from other blogs, but this duplicate content does not make up the majority of the content from these sites, and this is probably why it’s not frowned upon in these cases. Even WebProNews has had a blog partner program in place for years, in which we syndicate select posts from our partners, but this content has never dominated WebProNews. It’s never been the majority of what we publish, but a small percentage.

– Excessive amounts of very short articles may be a factor taken into consideration, because if that’s the majority of what you put out, the majority of your content is likely “shallow”. Now sometimes, short posts are sufficient. Sometimes there’s just not that much to say, but if these kinds of posts dominate, there’s a good chance there IS more to say about a lot of it, and someone else probably IS saying it, which makes those people better candidates for better rankings.

– eHow may still be ranking well at least partially because it has established a lot of backlinks over time. The nature of these links could come into play. There is some interesting discussion about this in a WebmasterWorld thread.

– Better, consistent page structure could also play a role (as brought up in that same thread…look at ehow vs HubPages (which was hit by the update).

– Update: PotPieGirl.com has some very interesting data, after running a test on seven key phrases that attract large amounts of spammy content. This might be very telling of at least one aspect of the Panda update. The following chart sayst it all. Look at the difference in percentages between EzineArticles and eHow.

Another dataset looks at the same phrases for articles just from the last month:

“In the last month, Ezine Articles has had close to 39,000 urls found/crawled in the Google index that have one of these 7 phrases on them. That means that 2.82% of the EzineArticles.com urls Google has found/crawled in the last month have this phrase on them,” says Jennifer (Pot Pie Girl), who put this data together. “That is almost 39 THOUSAND web pages in the Google index in the past month with one of those 7 phrases on them – from ONE SITE.”

More on what Google Algorithm Update casualties have to say.

If you have any insight into more things Google may be looking at (specific to this update), discuss these in the comments as well.

Google “Panda” Algorithm Update – What’s Known & What’s Possible

Ready to get started?

WebProNews is a leading publisher of business and technology email newsletters and websites.