Copyscape Searches For Scraped Content

    August 30, 2005

Having your content stolen or scraped for search engine purposes can lead to duplicate content penalties that can result in search engine index removal. Conversly, while your site may escape being punished, the fact that someone else stole your hard work and is presenting it as original content cannot be a pleasing discovery.

Because screen scraping is one of the weapons of choice for the black hat SEO folks, keeping track of your content, especially if you have a content-rich site that’s been optimized for search purposes, is an important part of the day-to-day webmaster tasks.

Enter, a site that searches for content that is identical to yours. In order to make Copyscape work, enter the URL you are concerned about into Copyscape’s search box and press the “Go” button. Copyscape will then spider it’s database, which is refreshed often, for duplicate content matching the queried URL.

For example purposes, I conducted a search using the WebProNews URL as the documents to be tested, although because the way Copyscape constructs their result page URLs (they only contain the address) I can’t link to the results of the WPN query. I can however tell you that (scraper site link) is currently scraping our content in order to get ranking benefits for the local internet marketing keyword phrase. At the bottom of the page I linked, you will find the following text:

local internet marketing

WebProNews – Breaking eBusiness News iEntry Members Save a trip to the post office! Click Here – For a special introductory offer. Violent Internet Porn May Face Ban In UK08-30-2005 03:06The British government announced today plans to ban downloading and possessing violent sexual material…Watch Out Apple, Creative’s Got the Patent08-30-2005 03:01Creative Technology was awarded a patent for inventing the user interface for portable media players…FileMaker Improves PDF Integration With Excel08-30-2005 02:30FileMaker has worked to improve integration of PDF files with Microsoft Excel, and has done so with the shipment of its a new version of its database, FileMaker Pro 8…Rainy Day Tips: Stumble Upon Mozilla’s Time Waster08-30-2005 02:14Maybe this is old news, but I haven’t heard many talking about it. Mozilla’s Firefox browser has a toolbar extension called “Stumble Upon” that will make you lose entire days to web surfing…

This mess goes on for some time

As you can see from that gibberish, the webmaster of local-internet-whateveryoursite’ is actively scraping the WPN content (visible dates in the above mess are 8-30-2005). Apparently, this scraper sites like to keep its stolen content as fresh as ours.

As for Copyscape, once you find which sites are duplicating your content, Copyscape gives the user some options. The first ones allow the user to either, visit the page who has stolen the content, see the site’s whois information, see the text on your site, and a link that allows you to respond to web plagiarism. You can also add one of the following “page protected by Copyscape” banners on your site in order to alert future scrapers.

If these banners don’t stop thieves from stealing your content, Copyscape also offers Copysentry, an anti-plagiarism membership service that starts out at $9.95 a month for up to 20 pages. This particular service actively monitors the web and alerts members via email if their content is showing up as being stolen. Members can choose either daily ($19.95 per 10 pages a month) or weekly ($9.95 per) protection. These different packages do just as they say. With the daily service, Copysentry monitors the web daily, while the less expensive service monitors on a weekly basis.

Even though my exposure to Copyscape may be limited (just started fooling with it today), I can say that if you are worried about another webmaster stealing your content, Copyscape is a very effective tool to employ to track these people down. What you do when you find them is another story, however. Normally, cease and desist letters are an effective weapon in stopping content theft, but not always. Sometimes you have obstinate webmasters who will choose to ignore the letter or disagree with original owner’s assessment. If this happens, you’d better be prepared for a contentious resolution, provided there is one to be had.

I would also like to see the size of Copyscape’s index that it monitors while searching for stolen content. When I conducted the WPN query, only six sites were returned as having our content and two of those sites were part of the iEntry network, meaning the content sharing was intentional. I am also aware of other sites that use our content and they did not show up in the Copyscape query. When Copyscape increases their index or web coverage, it has the potential to be one of the most useful search tools out there. As is, it’s still a very good tool, especially if you have stolen content concerns.

Hat-tip to Erin Bradley and the SearchViews blog.

Chris Richardson is a search engine writer and editor for WebProNews. Visit WebProNews for the latest search news.