Sites which scrape and copy content from other websites are often able to outrank the original source, as the examples in this post will demonstrate.
These “content thieves” are able to do damage to the original site’s rankings, causing a loss of search visibility, potential sales, and leads.
The examples here call into question the effectiveness of Google’s handling of content scrapers. At the moment, Google seems to be unable to recognize the original source of content consistently. This is a real problem for content creators.
In advance of Pi Datametrics’ talk at last week’s Brighton SEO conference, we devised a test to see how easy it was to disrupt another site’s rankings by copying their content.
In this post, I’ll look at the results of these tests and discuss what publishers can do to combat the problem.
Note: these examples are from Google.co.uk.
Can Copycat Sites Outrank the Originals?
The original idea for the test came when Pi Datametrics noticed the volatility of a client’s search rankings.
After some investigation, they found the cause was content thievery.
Example 1: Journeys by Design
Journeys by Design is a site that offers luxury African safari holidays. It has produced some well-researched content for its pages such as this one, which was written specifically for Journeys by Design.
However, this copy was lifted word-for-word and used by another website that also offers safari holidays.
It has also been copied by at least three other sites, such as this one:
In theory, Google should recognize the original source of the content and ensure that the copycat site doesn’t rank above the original for related searches. Yet that hasn’t happened.
The chart below shows the search rankings for the term “mountain gorilla’s nest” over an eight month period. The blue line shows the rankings for Journeys by Design. The other lines are from the sites that copied the original content.
We can see that the original site has ranked most consistently for the term over this period, but also that several of the copycat sites were also able to rank at various times.
Although the most significant fact is that these copycat sites also outranked the original, causing Journeys by Design to slip down past position 100 for days and weeks at a time.
This obviously has commercial implications, as searchers looking for safari holidays would be unable to find the site with one of its target terms, thanks to the copycats.
Having noticed this, Pi Datametrics decided to set up a test to see how easy it would be to rank for stolen content.
Generally speaking, the copycat sites are categorically weaker because they have fewer links and less valuable content – apart from that which was lifted from elsewhere.
Does this mean any weak site can simply steal content from stronger rivals and outrank them?
Example 2: Econsultancy
To find out, Pi Datametrics took an interview on PPC strategy that was posted on Econsultancy – my old site – and placed it on their Intelligent Positioning blog. The content was copied word-for-word, with their permission.
When searching for the article title, we can see that the copied version briefly interrupts the original’s search position.
However, if we search for a more generic term, like “PPC strategy,” the content thief is able to outrank the original, as shown by the red line.
It seems that Google didn’t know which one should rank for a while, with positions swapping for a few days, but the copied article won out in the end.
As I write this, the copied article sits at number 25 in Google, and the original is nowhere to be seen.
Example 3: ClickZ
The same test was carried out using ClickZ content. In this case, we used a guest post by Bryan Eisenberg on web form optimization.
Again, the content was copied word-for-word, with an image featuring a note to explain.
The results of this test are interesting. For one thing, it didn’t interrupt ClickZ‘s rankings as much as in the two previous examples.
After the article was copied, the original still ranks reasonably consistently for the phrase “online web form optimization”.
The odd thing is that the copied version also ranks in the top three positions for the same term, at the same time.
If you look closely, you’ll notice the dips in early September correspond to the peaks in ClickZ‘s search rankings. It is having an effect on ClickZ‘s position, but not as much as we might have expected.
However, the troughs from August 20 do correspond to the peak of another site: Bryan Eisenberg’s blog.
Bryan published his ClickZ post in full on his own blog, and had been enjoying some decent search visibility for the same search term. Although, this was before the copied article was published.
Bryan’s post has virtually vanished from the SERPs for this term, and now it’s replaced by the version copied from ClickZ for the test.
At the moment, the copycat version outranks the original ClickZ version, taking second place on Google U.K.
Meanwhile, the ClickZ article is five places below, and Bryan Eisenberg’s version isn’t even in the top 100 positions.
So once again, the copycat site is able to disrupt the search rankings of the original content producers, outranking them for periods of time.
Once the copied version has been removed from the Intelligent Positioning blog, I expect that ClickZ will return to the top two or three positions on Google. What happens to Bryan’s version of the post remains to be seen.
As I mentioned before, these tests were carried out on Google U.K. (with more will follow using .com) and the differences between U.S. and U.K. are interesting.
For example, while the IP blog was able to disrupt Bryan Eisenberg’s U.K. rankings with its copied content, it didn’t have quite the same effect in the U.S. SERPs.
It did rank for a short time, but my guess is that a combination of the other two sites’ authority and their U.S. location knocked it back down.
Also, Bryan’s version of the post continues to outrank ClickZ, yet Google still allows both versions to rank highly.
Why Does This Matter?
Content is very important for achieving SEO goals, though it does indeed have a life beyond the search engines. The ideal article is useful, providing value for readers over time while retaining high search visibility.
After taking the time, thought, and energy to work on composing a complex article, it’s rather annoying to find that another site can simply steal it and reap the SEO benefits.
This underlines the importance of monitoring the performance of your content long after pressing the Publish button. Closely monitoring your site’s performance enables you to take action against the copycats.
In addition, it also has implications for things like product copy used across multiple sites, which is something I’ll look at in more detail in a future article.
What Should Sites Do?
There are a number of measures that sites can take:
- Monitor search rankings so unusual drops can be spotted early.
- Look at the SERPs and see who else is ranking for your terms. How are they achieving this?
- Identify the offending website. A quick search for chunks of your copy can help to do this.
- Take the appropriate action. This may be contacting the website owner to ask them to remove the copied content, or by using Google’s content removal form. It used to have a scraper tool too, but that has been closed down.
.@mattcutts I think I have spotted one, Matt. Note the similarities in the content text: pic.twitter.com/uHux3rK57f
— dan barker (@danbarker) February 27, 2014
In addition, the Bryan Eisenberg example provides a lesson for sites accepting guest posts. It’s worth ensuring that guests don’t republish content in full on their own sites as this may affect your own rankings.
In my experience, I’ve found it’s better to ask them to publish extracts and point back to the original, thus avoiding this issue. Or ask them to use the rel=canonical link to indicate the original post.
What Does This Tell Us about Google?
One obvious conclusion is that this is an area where Google needs to improve. When copying of content is allowed to work like this, then it provides an incentive for the scrapers to use this tactic.
It did introduce a Scraper Report form last year, inviting examples of copycat sites outranking the originals.
It’s now closed, which suggests the purpose of the form was to gather examples to help Google improve its algorithm.
More broadly, Google’s method of dealing with this issue is inconsistent. For instance, in the ‘PPC strategy’ example earlier in this post, the original and copycat sites swap positions frequently, as if Google isn’t able to determine which is the copycat site.
As Pi Datametrics’ Jon Earnshaw explains:
Having content stolen can be an extremely frustrating and costly issue. It looks like there is a flaw in Google’s algorithm when it deals with duplicate content.
The best thing to do is track your terms and see if others are harming your site. You can only see this flipping of positions if you have daily URL tracking. If you see unexplained fluctuations, then digging deeper, ultimately you can report the abuser to Google.
The examples here tell us that copied content can be a real issue for sites, which will be losing traffic and potential sales through no fault of their own.
It also tells us that content thieves can win. They may not be able to rank consistently, but they can reach high search positions simply by cheating.
In the long-term, I’d hope that Google’s handling of copied content improves but, in the meantime, sites do need to be aware of this problem so they can take the appropriate action to minimize the damage.
Btw, it’s worth checking out Jon’s slides from Brighton SEO.