People have been claiming to see scrapers of their content showing up in Google search results over their own original content for ages. One SEO has pretty much proven that if you don’t take precautions, it might not be so hard for someone to hijack your search result by copying your content.
Have you ever had your search results hijacked? Scrapers ranking over your own original content? Let us know in the comments.
Dan Petrovic from Dejan SEO recently ran some interesting experiments, “hijacking” search results in Google with pages he copied from original sources (with the consent of the original sources). Last week, he posted an article about his findings, and shared four case studies, which included examples from MarketBizz, Dumb SEO Questions, ShopSafe and SEOmoz CEO Rand Fishkin’s blog. He shared some more thoughts about the whole thing with WebProNews.
First, a little more background on his experiments. “Google’s algorithm prevents duplicate content displaying in search results and everything is fine until you find yourself on the wrong end of the duplication scale,” Petrovic wrote in the intro to his article. “From time to time a larger, more authoritative site will overtake smaller websites’ position in the rankings for their own content.”
“When there are two identical documents on the web, Google will pick the one with higher PageRank and use it in results,” he added. “It will also forward any links from any perceived ’duplicate’ towards the selected ‘main’ document.”
In the MarketBizz case, he set up a subdomain on his own site, created a single page by copying the original HTML and images of the content he intended to hijack. The new page was +’ed and linked to from his blog. The page replaced the original one in the search results, thanks to a higher PageRank and a few days for Google to index the new page.
In the Dumb SEO Questions case, he tested whether authorship helped against a result being hijacked. Again, he copied the content and replicated it on a subdomain, but without copying any media. The next day, the original page was replaced with the new page in Google, with the original being deemed a duplicate. “This suggests that authorship did very little or nothing to stop this from happening,” wrote Petrovic.
In the Shop Safe case he created a subdomain, and replicated a page, but this time the page contained rel=”canonical”. The tag was stripped from the new page. The new page overtook the original in search, but it didn’t replace it when he used the info: command. +1’s had been removed after the hijack to see if the page would be restored, and several days later, the original page overtook the copy, Petrovic explained.
Finally, in the Rand Fishkin case, he set up a page in similar fashion, but this time, but “with a few minor edits (rel/prev, authorship, canonical)”. Petrovic managed to hijack a search result for Rand’s name and for one of his articles, but only in Australian searches. This experiment did not completely replace the original URL in Google’s index.
“Google is giving exactly the right amount of weight to PageRank,” Petrovic tells WebProNews. “I feel they have a well-balanced algorithm with plenty of signals to utilise where appropriate. Naturally like with anything Google tries to be sparing of computing time and resources as well as storage so we sometimes see limitations. I assure you, they are not due to lack of ingenuity within Google’s research and engineering team. It’s more to do with resource management and implementation – practical issues.”
The Dumb SEO Questions example was interesting, particularly in light of recent domain-related algorithm changes Google has made public. In his findings, Petrovic had noted that a search for the exact match brand “Dumb SEO Questions” brought the correct results and not the newly created subdomain. He noted that this “potentially reveals domain/query match layer of Google’s algorithm in action.”
Petrovic believes there is still significant value to having an exact match domain. “Exact match domains were always a good idea when it comes to brands, it’s still a strong signal when it it’s a natural situation, and is now more valuable than ever since Google has sweeped up much of the EMD spam,” he says.
Here’s what industry analyst Todd Malicoat had to say on the subject in a recent interview.
Regarding the Fishkin experiment, Petrovic tells us, “Google’s perception of celebrity status or authority are just a layer in the algorithm cake. This means that if there is a strong enough reason Google will present an alternative version of a page to its users. There goes an idea that Wikipedia is hardcorded and shows for everything.”
When asked if freshness played a role in his experiments, he says, “Yes. Freshness was a useful element in my experiments, but not the key factor in the ‘overtake’ – it’s still the links or should I say ‘PageRank’. I know this surprised a lot of people who were downplaying PageRank for years and making it lame to talk about it in public.”
“This article was me saying ‘stop being ignorant,'” he says. “PageRank was and is a signal, why would you as an SEO professional ignore anything Google gives you for free? The funniest thing is that people abandon PageRank as a ridiculous metric and then go use MozRank or ACRank as an alternative, not realising that the two do pretty much the same thing, yet [are] inferior in comparison.”
“To be fair, both are catching up with real PageRank, especially with Majestic’s ‘Flow Metrics’ and the growing size of SEOMoz’s index,” he adds.
Petrovic had some advice for defending against potential hijackers: use rel=”canonical” on your pages, use authorship, use full URLs for internal links, and engage in content monitoring with services like CopyScape or Google Alerts, then act quickly and request removals.
He also wrote a follow up to the article where he talks more about “the peculiar way” Google Webmaster Tools handles document canonicalization.
So far, Google hasn’t weighed in on Petrovic’s findings.
What are your thoughts about Petrovic’s findings? Share them in the comments.