What Percent of Plagiarism is Allowed in a Blog Post?

Key Takeaways

Plagiarism checkers like Copyscape and Grammarly flag quoted content even when properly attributed, making some similarity score unavoidable.
An acceptable plagiarism threshold for blog posts is generally 10-15%, with some sources tolerating up to 20% similarity.
Google is more sophisticated than plagiarism tools and uses context to distinguish genuine theft from legitimate quoting and syndication.
Attribution is the critical factor - copied content only becomes stolen content when passed off as original without credit.
Two main plagiarism penalties exist: legal action from original creators and Google search ranking penalties for low-quality duplicate content.

Plagiarism is unacceptable in content marketing. But there's nothing new under the sun. The content marketing world is full of words, phrases, and quotes that make it very hard to write anything truly original. Yet to copy others without credit is a form of fraud and can earn you site demotions and other marketing problems. How much "plagiarism" is acceptable?

Plagiarism Versus References

Plagiarism is one of the terms that's tossed around quite a bit without much understanding of what it means - it has a legal definition, though it has some overlap with copyright violations and other information copying techniques.

In the space of content marketing, you have to consider a few things when thinking about plagiarism.

First of all, you have to know how to check for plagiarism. That is, you can use online plagiarism checkers - tools like Copyscape, Grammarly, Turnitin, and places that integrate multiple checkers like SmallSEOTools. These tools either maintain their own index of information online, or they use targeted Google searches to look for phrases from a submitted piece, to match against existing content.

No tool has an index as large as Google's, though some may index tertiary sources of content Google might miss. My own personal check for plagiarism means running a Copyscape scan and a few Google searches for full sentences.

Here's the issue: these checks don't account for references and quotations. They look for duplications in text. But they don't look for attributions. Here's an example. I can quote this block:

"Plagiarism usually happens when a writer fails to:

Cite quotes or ideas written by another author;
Enclose direct text in quotes; or
Put summaries and/or paraphrases in his or her own words."

And that would technically come up as an instance of plagiarism in this very blog post. I obviously quoted it. But what's the source? I haven't told you - which makes this a technical theft of content. Then again, maybe at the bottom of this post, I've added a footnote with the source. Copyscape and Grammarly aren't going to be able to find that.

Two documents side by side being compared

Of course, I don't have this footnote. What I do have is this link right here. That quote comes from a legal reference on the subject of plagiarism. And now, with this paragraph, I've made my quote legitimate.

If you're writing a blog post that makes heavy reference to a few other sources of information, or just references one source that you then line-by-line refute, you're going to get a high percentage of plagiarism back from your checks. You need to use your own judgment and analyze the context of the copied content to see if it counts.

You also have the issue of common sentences. Certain sentences tend to be truisms amongst your industry, repeated as catchphrases or as ironic sources of humor, or just facts that are commonly repeated. These, if they're sufficiently long, can trigger plagiarism scans - even when there's no clear source of the original sentence, or when it has basically become an industry meme.

All of this comes together to show you that blog posts are going to show at least some level of plagiarism when scanned by any automated process - maybe a meager 2-3%, or maybe higher. Ironically, the more time you spend collecting references and quoting sources, the more likely your post is to be flagged.

This isn't to say you should stop quoting or citing sources; these add value to your post. What you should do is accept that tools like Copyscape, Grammarly, and Turnitin are not the ultimate arbiters of plagiarism.

Plagiarism and Templates

Plagiarism can also involve non-text content, which makes it harder for online scans to find - it can include a full website layout that matches very closely to another site's original design, which courts have considered a form of plagiarism or copyright infringement.

This is where things can get very cloudy. Does this precedent mean that everyone using the same WordPress theme is guilty of plagiarism? Well, no - and the reason is attribution and licensing.

Template document with plagiarism highlighted

WordPress themes are explicitly created for anyone to use them. The creators of the themes are not going to go after copyright violations against those who use them. Some themes have a "theme created by X" attribution in the footer, or just in a comment in the code - and that's all they care about. Remove that and you might have problems. But most likely they will simply ask for it to be restored.

When someone copies the site design of a site that isn't a publicly available template, that's when plagiarism comes in - and that's when the original creator can start to take action.

Plagiarism Judgment and Penalties

When does plagiarism matter? Think about it - we're on the internet; there are people out there who have made a living only off stolen content, copied wholesale and infused with links or ads.

There are basically two possible penalties for plagiarism, though a third is implicit.

The first penalty is rare, but very damaging - the penalty wherein the original creator of the content discovers that you plagiarized their content and sues you. You will be forced to prove that you created the content, or else admit that you stole it - this will be a violation of copyright and can have a number of penalties, ranging from taking down the copied content to monetary damages.

It's rare because most websites don't have the inclination or the funds to take every offender to court. Copied content is a very prevalent problem, and most of the time the thieves are never found or brought to trial. Many of them use false information or reside in a country where legal action is difficult to pursue. If someone stole one of your blog posts, would you consider taking them to court? My guess is probably not.

The other penalty is the primary penalty, and it's the Google search penalty. Going back to the Panda algorithm update and continuing through every core update since, Google has taken copied content increasingly seriously - and its ability to detect it has grown far more refined over time.

Google is far more advanced than tools like Copyscape or Grammarly - it can use context to separate between content that's actually stolen and content that's not. Here are some things that might trip up Copyscape but won't trip up Google:

Judge's gavel beside plagiarism detection report

Content posted on two versions of a website, for example a standard and a mobile site.
Content posted on a blog and duplicated on a printer-only version of the page.
Content in a store that has dynamic URLs, showing the same content on multiple different pages.
Content published with attribution on multiple URLs, as with syndication.

Most of these problems can be solved with use of the rel="canonical" tag, which you add to each version of a piece of content. That works for full pages, though. What about longer quote blocks?

Again, this is where Google's sophistication works. Google can read context and find when a piece of content is quoted in part but surrounded by original content. A formatted quote block is easy to distinguish from a stolen blog post.

Partial copied content that's used without attribution or passed off as original is harder to detect. But it can still run afoul of Google's push to surface quality results. If one page was published in January, and another with largely the same content - and nothing new of value - is published in March, Google is more likely to rank the earlier one. Duplicating content mostly just means your post won't rank in comparison to the original.

The third implicit penalty comes from when your site is found and penalized for plagiarism, or has its results removed from legal notices via the DMCA. That is damage to your reputation. You become a known content thief and can have all your guest posts, all your links, all your references disappear overnight. Legitimate marketers don't want to work with spammers.

How Much Plagiarism Is Acceptable?

Back to the original question at hand, the answer changes depending on context - and the thresholds cited across the industry can vary widely.

For SEO-focused blog posts, a cited acceptable threshold is around 15%. But blog writing more broadly is usually considered acceptable as high as around 10-15%, depending on post length. Google itself is said to be fairly tolerant of up to roughly 2-3% similarity for ranking purposes, though that figure deals only with what automated matching might flag - not necessarily what Google's wider quality systems review.

For content agencies and professional creators, the preferred target remains 0% intentional plagiarism - though in practice, incidental similarity from common phrases, quoted sources, and industry truisms means most well-cited posts will register some degree of similarity in automated scans. According to PlagiarismCheck.org, as high as 20% similarity may be tolerated before it becomes a concern.

Here are a few different perspectives you can take when creating your own position:

First: the puritan point of view. In your mind, everything needs to be 100% original. Any result - even a 2% flag from Copyscape - is too much. You work to make everything original.

My view on this perspective is that it's unrealistic. Yes, everything you write should be original. However, we live in a society and we participate in a community. Without building upon the work of others, we can't progress. Quoting those with more experience than you, referencing points you want to support or refute - these can trigger plagiarism checkers. But when used appropriately, they help you participate meaningfully in your community.

Second: the standard point of view. You view your work in the bigger context of the community in which you're working. You quote others with more knowledge in certain areas, and you don't mind when others quote you. You keep your plagiarism score comfortably below 10-15%, you use attribution, and move on.

My view on this is that it's the most sensible and sustainable position. Big names in your industry reference others. News articles quote from press releases. Interviews quote the same person across multiple sites - it's just how the world works.

Acceptable plagiarism percentage threshold chart

The part that matters is attribution. Copied content is only stolen content if you're trying to pass it off as your own.

Third: the pragmatist point of view. This is the "gray hat" strategy; you'll do whatever works. If that means building a blog out of 50% copied content per post - or spinning content, or something similar - then so be it. As long as you aren't incurring penalties for it, what does it matter?

My view on this is that it's all too common. People like this will always be out there. If Google decides any site with more than 50% copied content should be removed from their index, those sites will adjust just enough to hit 49% - it's an endless arms race with no clean answer.

I don't recommend this perspective. It may be helpful on a small scale. But you'll never become the next authority in your niche with that attitude. When you're fighting to survive instead of fighting to grow, your ceiling for performance is much lower.

Fourth: the unrepentant thief. There are people out there who will shamelessly steal content from anyone and everyone, just to make a quick dollar. If they get a DMCA, so what? Remove that content. They hide behind false information so they're impossible to sue, and they basically make their living as a small-time internet criminal.

My view on this is, of course, that it's shortsighted. These people put effort into trying to skirt the law when that same effort could establish them as a legitimate presence quite effectively. They will always be out there - but they will also always have a ceiling.

Where do you fit? What's your point of view?

2 responses

Thoughtful replies only - we moderate for spam, AI slop, and off-topic rants.

F68.10 says:

May 26, 2019 at 4:03 pm

"The first penalty is rare, but very damaging. This is the penalty wherein the original creator of the content discovers that you plagiarized their content and sues you."

Are there any kind of official statistics as to how often suing happens in the US on the topic of plagiarism?

1. James Parsons says:
  
  June 9, 2019 at 4:56 pm
  
  On the topic of web content, I think this is extremely rare, especially if the two businesses involved aren't in the same country. To be safe, always cite your sources if using content from another website!

What Percent of Plagiarism is Allowed in a Blog Post?

Plagiarism Versus References

Plagiarism and Templates

Plagiarism Judgment and Penalties

How Much Plagiarism Is Acceptable?

James Parsons

2 responses

Leave a comment Cancel reply

Want content like this shipping from your site?

Plagiarism Versus References

Plagiarism and Templates

Plagiarism Judgment and Penalties

How Much Plagiarism Is Acceptable?

James Parsons

Leave a comment Cancel reply

Keep reading.

The Ultimate Guide to Fighting Theft on Your Blog

Can Using Quotes in Articles Hurt Your Blog Post Rankings?

What is The Best Affordable Content Writing Service?

Want content like this shipping from your site?