Blog > Search Engine Optimization > How to Use Copyscape to Avoid Future Google Penalties
How to Use Copyscape to Avoid Future Google PenaltiesPublished by James Parsons • Category: Search Engine Optimization

When the original Panda update hit, sites everywhere bemoaned the events that led to their penalties. Among the many faults that Google targeted, copied content was perhaps the most important. Copied content, even syndicated content or product entries with too much similarity, earned sites a hefty penalty. Some could remove a page and restore their position. Others required near-complete rewrites of their existing content. Some simply folded up and disappeared. It was the end of an era and the start of something new.

The Perils of Copied Content

Copied content could show up anywhere, and sometimes you have little or no control over it.

  • At the most basic level is the content you intentionally copied, whether you posted the same post on multiple blogs or you scraped content from a higher ranked site. In the past, some of this worked for SEO. Now it doesn’t.
  • Occasionally, when you purchased content, you only purchased the use rights to the piece. Anyone else also purchasing use rights could post the same content, leading to duplicate content issues.
  • Content that was spun often counted as duplicate content. Google maintains a dictionary similar to those used by spinners, which recognizes variations of a phrase as spun. In sufficient quantities – that is, most of an article – such spun content flags the piece as copied.
  • Syndicated content – content officially and benignly posted on multiple sites – would occasionally trip penalty flags. There’s nothing wrong with syndication, but for a while, it was a hazardous practice.
  • Product descriptions were often flagged in large commerce sites. An inventory of 30 different types of screws would often have similar product descriptions with only a few small details changed. These ended up flagged as duplicate content.

With so many ways duplicate content could appear on your site, it’s easy to see how so many sites were penalized. Among those that have recovered, a tool was needed to prevent future instances of duplicate content. This tool, it has emerged, is Copyscape.

What is Copyscape? How does it Work?

Copyscape is the current industry standard plagiarism checker. It’s the most widely used out of many such tools, though nothing stops you from using more than one to be extra certain. In essence, Copyscape scans the Internet for instances of a piece of content. If the content you search for is present elsewhere, in whole or in part, Copyscape will let you know.

What-is-Copyscape

With Copyscape, you plug in a piece of content, small or large. It might be a 200-word product description. It might be a 5,000-page website. Whatever the content is, you submit it and let Copyscape do its thing. Copyscape scans the Internet for instances of the content anywhere online.

This provides a few benefits. First, it lets you know if any of your content has been copied or scraped elsewhere online. It lets you know if a piece of content you purchased has been copied. You can also compare two pieces of content to check for similarity, if you’re interested in seeing how different a spun article has to be before it is considered unique.

Copyscape, Copyscape Premium and More

Copyscape comes in two primary flavors; free and premium. The free version allows a few basic searches and comparisons, with a limited array of search results and a cap on the number of searches you can perform each month. It works for small businesses checking the occasional blog post, but if your site is larger or you have higher volumes of content, you’re going to want to find the budget for premium.

The premium Copyscape service offers a number of additional features.

  • A more powerful search than what you find with the free service.
  • Offline search by copying and pasting your own content.
  • A site-wide batch search, complete with a tag for skipping certain content.
  • An API for automatic searching of submitted content.
  • A low fee of only five cents per search or comparison.

If you publish content frequently, particularly if you purchase that content from a content mill or marketplace, you’ll benefit from using the API to search for matches.

Using Copyscape Intelligently

This is the biggest part of using Copyscape. If you’re blindly rejecting any content with a match, you’re probably missing out on quality content. When you view the search results of a given page, consider what they mean.

  • No matches is, of course, good. It means the content is unique as far as Copyscape can tell.
  • Minor matches. Have you ever entered search terms into Google and had pages come up that only included a couple of the words, but had nothing to do with your overall meaning? Sometimes, this happens with Copyscape results. A single common phrase can trigger a plagiarism notice, even when the context of that phrase is entirely different.
  • Larger matches. Occasionally, a piece of content will be flagged as including some percentage of duplicate content. What is that content? A direct quote, with attribution, will be flagged, for example. It’s perfectly legitimate, however, and if you blindly reject anything with a Copyscape match, you’ll lose that valuable sourced content.
  • Complete matches. Copyscape makes no distinction about the source of a given match. Was the content it found on your own site? Occasionally, Copyscape will tell you that you’ve posted your own content. This can often be an issue solved with the rel=canonical tag.

Using-Copyscape-Intelligently

Copyscape is a brilliant and valuable way to catch content scrapers and content thieves. It’s also an important tool for analyzing content you buy to make sure it’s unique.

When it comes time to actually use Copyscape, you have a few options. First, you can manually submit and scan any piece of content you’re about to purchase. If that content is copied, you can have it revised or reject it, depending on the processes of the source. You can also use the Premium features to scan your entire site for existing copies on the web. If you find such copies, you can determine how to tackle the issue of duplicate content, which itself can fill an entire blog post.

For larger sites or for a more automatic process, you can integrate the Copyscape API into your publication, submission or purchase process. The API is very robust and can be configured in a number of ways to suit your needs.

You can also use Copysentry, Copyscape’s active protection service. For a monthly fee, Copysentry will monitor up to 500 pages – ten basic, and a per-page fee beyond those ten – on an ongoing basis. Such scans take place either once per week or once per day, depending on the level of service you’re buying.

The end result is, primarily, awareness. Copyscape is not a content removal tool. If you find your content has been scraped, or that the content you’re interested in buying has been previously published, it’s up to you to determine what to do. However, it’s important to learn to avoid duplicate content to avoid future Google penalties.

Written by James Parsons

James Parsons

James is a content marketing and SEO professional who enjoys the challenge of driving sales through blogging while creating awesome and useful content.

Comments

Questions for us? Comments? Thoughts? Leave a reply!

Leave a reply

Share26
Tweet
Pin
Buffer
26 Shares