Plagiarism is unacceptable, but there’s nothing new under the sun. The world of content marketing is full of words, phrases, and quotes that make it incredibly difficult to write anything completely unique. Yet to copy someone else is akin to fraud, and can earn you site demotions and other marketing difficulties. How much “plagiarism” is acceptable?
Plagiarism is one of those terms that is tossed around a lot without people knowing exactly what it means. It actually has a specific legal definition, though it has some overlap with copyright violations and other information copying techniques.
In the realm of content marketing, you have to consider a few factors when thinking about plagiarism.
First of all, you have to consider how people check for plagiarism. That is, people use online plagiarism checkers, things like Copyscape, Grammarly, and tools that integrate those tools like SmallSEOTools. These tools either maintain their own index of information online, or they use extensive and creative Google searches to look for relatively unique phrases from a submitted piece, to match against existing content.
No tool has an index as large as Google’s, though some may index tertiary sources of content Google might miss. My own personal check for plagiarism involves running a Copyscape scan and a few Google searches for full sentences.
Here’s the issue: these checks do not account for references and quotations. They look for duplications in text, but they don’t look for attributions. Here’s an example. I can quote this block:
“Plagiarism usually occurs when a writer fails to:
And that would technically come up as an instance of plagiarism in this very blog post. I obviously quoted it, but what’s the source? As of yet, I haven’t told you, which makes this a technical theft of content. Then again, maybe all the way at the bottom of this post, I have a footnote with the source. Copyscape and Grammarly aren’t going to be smart enough identify that.
Of course, I don’t have such a footnote. What I do have is this link right here. That quote comes from an old LegalZoom blog post on the subject of plagiarism. And now, with this paragraph, I’ve made my quote completely legitimate.
If you’re writing a blog post that makes heavy reference of several other sources of information, or even just references one source that you then line-by-line refute, you’re going to get a high percentage of plagiarism back from your checks. You need to use your own judgment and analyze the context of the copied content to see if it counts.
You also have the issue of common sentences. Certain sentences tend to be truisms amongst your industry, repeated as catch phrases or as ironic sources of humor, or even just facts that are often repeated. These, if they’re sufficiently long or unique, can trigger plagiarism scans, even when there’s no clear source of the original sentence, or even when it has basically become an industry meme.
All of this comes together to show you that a lot of blog posts are going to show at least some level of plagiarism when scanned by any automated process. It might be a meager 2-3%, or it might be higher. Ironically, the more time you spend gathering references and quoting sources, the more likely your post is to be flagged.
This isn’t to say you should stop quoting or citing sources; these are additional value that can be very worthwhile to your post. No, what you should do is recognize that sites like Copyscape and Grammarly are not the arbiters of plagiarism.
Plagiarism can also include non-text content, which makes it harder for online scans to find. This example includes a full website layout that matches very closely to the original, and is considered plagiarism.
This is where things can get very cloudy. Does this precedent mean that everyone using the same WordPress theme is guilty of plagiarism? Well, no, and the reason is attribution and licensing.
WordPress themes are explicitly created for anyone to use them. The creators of those themes are not going to pursue copyright violations against the people who use them. Some themes have a “theme created by X” attribution in the footer, or even just in a comment in the code, and that’s all they care about. Remove that and you may have issues, but most likely they will simply ask for it to be restored.
When someone copies the site design of a site that isn’t using a publicly available template, that’s when plagiarism comes in, and that’s when, like the example above, you can begin to take action.
When does plagiarism matter? After all, we’re on the internet, where there exist people who have made a living solely off stolen content, copied wholesale and infused with links or ads.
There are essentially two possible penalties for plagiarism, though a third is implicit.
The first penalty is rare, but very damaging. This is the penalty wherein the original creator of the content discovers that you plagiarized their content and sues you. You will be forced to prove that you created the content, or else admit that you stole the content. This will be a violation of copyright and can come with a wide array of penalties, ranging from taking down the copied content to monetary damages.
This is rare because most websites don’t have the inclination or the funds to take every offender to court. Copied content is a very prevalent problem, and most of the time the thieves are difficult to identify or bring to trial. Many of them use false information or simply reside in a country where legal action is difficult to pursue. If someone stole one of your blog posts, would you consider taking them to court? My guess is probably not.
The other penalty is the primary penalty, and it’s the Google search penalty. Around 2011, when Google first rolled out the Panda algorithm update, they began taking copied content very seriously.
Google is a lot more sophisticated than sites like Copyscape and Grammarly. They’re able to use context and differentiate between content that is actually stolen and content that is not. Here are some things that might trip up Copyscape but won’t trip up Google:
Most such issues can be solved with proper use of the rel=”canonical” tag, which you add to each version of a piece of content. This is for full pages, though. What about longer quote blocks?
This, again, is Google’s sophistication. Google is able to read context and can identify when a piece of content is quoted in part, but surrounded by unique content. My quote block up above is easy to tell that it’s just a quote and not a full stolen blog post.
Partial copied content that is used without attribution or passed off as original by the thief is harder to detect, but it can run afoul of Google’s push to maintain a unique selection of search results. If one page was published in January, and another with largely the same content – and nothing new of value – is published in March, Google is more likely to use the earlier one. Duplicating content mostly just means your post won’t rank in comparison to the original.
The third implicit penalty comes from when your site is discovered and penalized for plagiarism, or has its results removed from legal notices via the DMCA. That is, damage to your reputation. You become a known content thief and can have all your guest posts, all your links, all your references disappear overnight. Legitimate marketers don’t want to work with spammers.
Back to the original question at hand, there are a few different perspectives you can use when you’re considering the answer to this question.
First: the puritan point of view. In your mind, everything needs to be 100% unique. Any result, even a 2% plagiarized result in Copyscape, is too much. You strive to make everything original.
My view on this perspective is that it’s unrealistic. Yes, everything you write should be unique. However, we live in a society, and we participate in a community. Without building upon the work of others, we cannot progress. Quoting those with more experience than you, referencing points you want to support or refute; these can trigger plagiarism checkers, but are used totally appropriately and, more importantly, to participate in the community.
Second: the standard point of view. I basically just went over this; you view your work in the larger context of the community in which you’re working. You quote others with more knowledge in certain areas, and you don’t mind when others quote you. So long as you aren’t exceeding 10% of your post marked as plagiarism (according to checkers), you’ll be fine.
My view on this is that it’s the most typical position to take. Big names in your industry are going to reference others. News articles are going to quote from press releases. Interviews quote the same thing from the same person on different sites. It’s just how the world works.
The important part is attribution. Copied content is only stolen content if you’re trying to pass it off as your own.
Third: the pragmatist point of view. This is the “gray hat” point of view, where you’ll do whatever you want, so long as it works. If that means making a blog out of 50% copied content in every post – or spinning content, or what have you – then so be it. As long as you aren’t eating penalties for it, what does it matter?
My view on this perspective is that it’s all to common and that it’s always going to exist. If Google decides any site with more than 50% copied content should be removed from their index, all of those sites will change up just enough to hit 49%. It’s an endless arms race, and there’s no solution to the problem.
I don’t recommend that you partake in this perspective, however. It might be effective on a small scale, but you’ll never be the next Forbes, the next VentureBeat, or the next Site You Look Up To with that kind of attitude. When you’re fighting to survive rather than fighting to thrive, your ceiling for performance is much lower.
Fourth: the unrepentant thief. There are people out there who will shamelessly steal content from anyone and everyone, just to make a quick buck. If they get a DMCA, so what? Remove that content. They hide behind false information so they’re impossible to sue, and they simply make their living as a small time internet criminal.
My view on this one is, of course, that it’s pretty dumb. These people put a lot of effort into trying to skirt the law when that same amount of effort would establish them as a legitimate player quite easily. There are just some people who will always think that doing things in a way that “works” and is outside the rules is somehow better. They’re wrong, but they will also always exist.
Where do you fit? What’s your point of view?
F68.10
says:“The first penalty is rare, but very damaging. This is the penalty wherein the original creator of the content discovers that you plagiarized their content and sues you.”
Are there any kind of official statistics as to how often suing happens in the US on the topic of plagiarism?
James Parsons
says:On the topic of web content, I think this is extremely rare, especially if the two businesses involved aren’t in the same country. To be safe, always cite your sources if using content from another website!