Being Outranked by a Scraper Site? Google Says it's Your Problem


A couple of weeks ago, I wrote about the results of an experiment which looked at how easy it was to outrank sites simply by stealing their content.

It turns out that, to varying degrees, this was indeed possible, even when the target sites were well-establish authority sites like ClickZ and Econsultancy.

For example, PI Datametrics was able to outrank an Econsultacy article simply by replicating the content in full on its own blog. 

This chart shows the Intelligent Positioning blog, shown by the yellow line outranking the original post, marooning it below page 100 of Google.

It was a clear copy (stated within the article) and was published long after the original, so there should be no confusion over the dates. Scraping software wasn’t used, but it’s a straight copy of the original.


However, speaking at SMX recently, Google Webmaster Trends Analyst Gary Illyes stated that, if you are being outranked then it’s probably a problem with your site.

This is interesting, given the results of the recent experiment. Essentially, if scrapers can outrank Econsultancy and ClickZ, this means both sites have a problem.

Having worked on both, this is news to me. Neither has received any penalties, or have experienced any dramatic loss of traffic or rankings.

Of course, there may be issues with both sites I’m unaware of, but Google’s assertion doesn’t make sense to me.

After all, it was only last year that Matt Cutts was asking for examples of just this issue:

It also implemented a Scraper Report Tool, which has since been discontinued. There was some speculation on its purpose. Was Google harvesting examples to help it deal with the issue? Or was this a tool to report scrapers?

Either way, it did send the message that Google clearly thought this was a problem. So, given this latest statement, can we consider that Google has solved the issue of scrapers? I’m not so sure…

I’m not pretending that this isn’t a difficult issue, as no obvious method exists for distinguishing between original and copied content, especially when there’s no obvious time gap between the two.

However, if this is happening as I and many others suspect, then essentially Google is indexing the wrong sites, which is bad for users and bad for the people creating original content.


Source link