Duplicate Content: Why does it happen and how to fix issues (2023)

What is duplicate content?

Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you’ve got duplicate content.

While not technically a penalty, duplicate content can still sometimes impact search engine rankings. When there are multiple pieces of, as Google calls it, "appreciably similar" content in more than one location on the Internet, it can be difficult for search engines to decide which version is more relevant to a given search query.

Why does duplicate content matter?

For search engines

Duplicate content can present three main issues for search engines:

  1. They don't know which version(s) to include/exclude from their indices.

  2. They don't know whether to direct the link metrics (trust, authority, anchor text, link equity, etc.) to one page, or keep it separated between multiple versions.

  3. They don't know which version(s) to rank for query results.

For site owners

When duplicate content is present, site owners can suffer rankings and traffic losses. These losses often stem from two main problems:

  1. To provide the best search experience, search engines will rarely show multiple versions of the same content, and thus are forced to choose which version is most likely to be the best result. This dilutes the visibility of each of the duplicates.

    (Video) How To Fix Duplicate Content Issues - Quick SEO Tutorial

  2. Link equity can be further diluted because other sites have to choose between the duplicates as well. instead of all inbound links pointing to one piece of content, they link to multiple pieces, spreading the link equity among the duplicates. Because inbound links are a ranking factor, this can then impact the search visibility of a piece of content.

The net result? A piece of content doesn't achieve the search visibility it otherwise would.

Duplicate Content: Why does it happen and how to fix issues (1)

How do duplicate content issues happen?

In the vast majority of cases, website owners don't intentionally create duplicate content. But, that doesn't mean it's not out there. In fact by some estimates, up to 29% of the web is actually duplicate content!

Let's take a look at some of the most common ways duplicate content is unintentionally created:

1. URL variations

URL parameters, such as click tracking and some analytics code, can cause duplicate content issues. This can be a problem caused not only by the parameters themselves, but also the order in which those parameters appear in the URL itself.

For example:

Similarly, session IDs are a common duplicate content creator. This occurs when each user that visits a website is assigned a different session ID that is stored in the URL.

Duplicate Content: Why does it happen and how to fix issues (2)

Printer-friendly versions of content can also cause duplicate content issues when multiple versions of the pages get indexed.

(Video) ⚠️ Website duplicate content: why it occurs and how to fix it

Duplicate Content: Why does it happen and how to fix issues (3)

One lesson here is that when possible, it's often beneficial to avoid adding URL parameters or alternate versions of URLs (the information those contain can usually be passed through scripts).

2. HTTP vs. HTTPS or WWW vs. non-WWW pages

If your site has separate versions at "www.site.com" and "site.com" (with and without the "www" prefix), and the same content lives at both versions, you've effectively created duplicates of each of those pages. The same applies to sites that maintain versions at both http:// and https://. If both versions of a page are live and visible to search engines, you may run into a duplicate content issue.

3. Scraped or copied content

Content includes not only blog posts or editorial content, but also product information pages. Scrapers republishing your blog content on their own sites may be a more familiar source of duplicate content, but there's a common problem for e-commerce sites, as well: product information. If many different websites sell the same items, and they all use the manufacturer's descriptions of those items, identical content winds up in multiple locations across the web.

How to fix duplicate content issues

Fixing duplicate content issues all comes down to the same central idea: specifying which of the duplicates is the "correct" one.

Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. Let's go over the three main ways to do this: Using a 301 redirect to the correct URL, the rel=canonical attribute, or using the parameter handling tool in Google Search Console.

301 redirect

In many cases, the best way to combat duplicate content is to set up a 301 redirect from the "duplicate" page to the original content page.

When multiple pages with the potential to rank well are combined into a single page, they not only stop competing with one another; they also create a stronger relevancy and popularity signal overall. This will positively impact the "correct" page's ability to rank well.

Duplicate Content: Why does it happen and how to fix issues (4)

Rel="canonical"

Another option for dealing with duplicate content is to use the rel=canonical attribute. This tells search engines that a given page should be treated as though it were a copy of a specified URL, and all of the links, content metrics, and "ranking power" that search engines apply to this page should actually be credited to the specified URL.

(Video) Can Duplicate Content Impact Your Website's SEO?

Duplicate Content: Why does it happen and how to fix issues (5)

The rel="canonical" attribute is part of the HTML head of a web page and looks like this:

General format:

<head>...[other code that might be in your document's HTML head]...<link href="URL OF ORIGINAL PAGE" rel="canonical" />...[other code that might be in your document's HTML head]...</head>

The rel=canonical attribute should be added to the HTML head of each duplicate version of a page, with the "URL OF ORIGINAL PAGE" portion above replaced by a link to the original (canonical) page. (Make sure you keep the quotation marks.) The attribute passes roughly the same amount of link equity (ranking power) as a 301 redirect, and, because it's implemented at the page (instead of server) level, often takes less development time to implement.

Below is an example of what a canonical attribute looks like in action:

Duplicate Content: Why does it happen and how to fix issues (6)

Using MozBar to identify canonical attributes.

Here, we can see BuzzFeed is using the rel=canonical attributes to accommodate their use of URL parameters (in this case, click tracking). Although this page is accessible by two URLs, the rel=canonical attribute ensures that all link equity and content metrics are awarded to the original page (/no-one-does-this-anymore).

Meta Robots Noindex

One meta tag that can be particularly useful in dealing with duplicate content is meta robots, when used with the values "noindex, follow." Commonly called Meta Noindex,Follow and technically known as content=”noindex,follow” this meta robots tag can be added to the HTML head of each individual page that should be excluded from a search engine's index.

General format:

(Video) Fixing Duplicate Content Issues on an Ecommerce Website | Edge of the Web Radio SEO Podcast

<head>...[other code that might be in your document's HTML head]...<meta name="robots" content="noindex,follow">...[other code that might be in your document's HTML head]...</head>

The meta robots tag allows search engines to crawl the links on a page but keeps them from including those links in their indices. It's important that the duplicate page can still be crawled, even though you're telling Google not to index it, because Google explicitly cautions against restricting crawl access to duplicate content on your website. (Search engines like to be able to see everything in case you've made an error in your code. It allows them to make a [likely automated] "judgment call" in otherwise ambiguous situations.)

Using meta robots is a particularly good solution for duplicate content issues related to pagination.

Preferred domain and parameter handling in Google Search Console

Google Search Console allows you to set the preferred domain of your site (i.e. http://yoursite.com instead of http://www.yoursite.com) and specify whether Googlebot should crawl various URL parameters differently (parameter handling).

Duplicate Content: Why does it happen and how to fix issues (7)

Depending on your URL structure and the cause of your duplicate content issues, setting up either your preferred domain or parameter handling (or both!) may provide a solution.

The main drawback to using parameter handling as your primary method for dealing with duplicate content is that the changes you make only work for Google. Any rules put in place using Google Search Console will not affect how Bing or any other search engine's crawlers interpret your site; you'll need to use the webmaster tools for other search engines in addition to adjusting the settings in Search Console.

Additional methods for dealing with duplicate content

  1. Maintain consistency when linking internally throughout a website. For example, if a webmaster determines that the canonical version of a domain is www.example.com/, then all internal links should go to http://www.example.co... rather than http://example.com/pa... (notice the absence of www).

  2. When syndicating content, make sure the syndicating website adds a link back to the original content and not a variation on the URL. (Check out our Whiteboard Friday episode on dealing with duplicate content for more information.)

  3. To add an extra safeguard against content scrapers stealing SEO credit for your content, it's wise to add a self-referential rel=canonical link to your existing pages. This is a canonical attribute that points to the URL it's already on, the point being to thwart the efforts of some scrapers.

    (Video) Would A 301 Redirect Fix Websites With Duplicate Content Issues?
    Duplicate Content: Why does it happen and how to fix issues (8)
    A self-referential rel=canonical link: The URL specified in the rel=canonical tag is the same as the current page URL.

    While not all scrapers will port over the full HTML code of their source material, some will. For those that do, the self-referential rel=canonical tag will ensure your site's version gets credit as the "original" piece of content.

Keep learning

Put your skills to work

Moz Pro's site crawl can help identify duplicate content on a website. Try it >>

FAQs

What causes duplicate content? ›

Common causes of duplicate content. Duplicate content is often due to an incorrectly set up web server or website. These occurrences are technical in nature and will likely never result in a Google penalty. They can seriously harm your rankings though, so it's important to make it a priority to fix them.

Why is having duplicate content and issue for SEO? ›

Too much duplicate content within a website (or on the web, in general) can confuse search engines, and the wrong page does occasionally rank over the right one. This can lead to SERP results that aren't as accurate as they should be, which can frustrate users, hurt your traffic, and raise bounce rates.

How do you best describe duplicate content? ›

Duplicate content is the same content that appears on 2 or more unique URLs. Same content is defined as blocks of content that is "appreciably similar" which can range from exact copies to content that contains chunks of copied text.

How do I remove duplicate content from my website? ›

Here are the three ways I recommend.
  1. The rel=canonical tag. In most cases, the rel=canonical tag is the best way to transfer SEO juice from a duplicate page to another page. ...
  2. 301 redirect. Sometimes, you don't want the duplicate of your web page to stick around. ...
  3. Set passive parameters in Google Search Console.

What is considered duplicate content? ›

What is duplicate content? Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you've got duplicate content.

What are the content issues? ›

These issues relate to your content and how Google views it. Duplicate, thin, and slow content, for example, can negatively impact rankings. Content issues vary in severity, but resolving them may improve your site's ability to rank.

Why should you avoid duplicate content? ›

The Impact of Duplicate Content

Key pages unexpectedly not performing well in SERPs or experiencing indexing problems. Fluctuations or decreases in core site metrics (traffic, rank positions, or E-A-T criteria) Other unexpected actions by search engines as a result of confusing prioritization signals.

Is all duplicate content Bad? ›

What Harm Can Duplicate Content Cause? External duplicate content, if created intentionally, can't cause any harm. Still, you must identify which version of your content is the original, as that will be the version that gets indexed. Internal content duplication can cause Google to link to the wrong page on your site.

How does SEO handle duplicate content? ›

Restructuring a link format too can create multiple copies of the same content. To reduce the impact of such duplicate content issues, set up 301 redirects. 301 redirects from the non-preferred URLs of a resource to their preferred URLs are a great way to alert the search engines about your preference.

How does SEO check for duplicate content? ›

Using Google to check for Duplicate Content

One quick way to check if a page may be considered duplicate is by copying around ten words from the start of a sentence and then pasting it with quotes into Google. This is actually Google's recommended way to check.

How does Google determine duplicate content? ›

Pick a canonical URL for each of your pages and submit them in a sitemap. All pages listed in a sitemap are suggested as canonicals; Google will decide which pages (if any) pages are duplicates, based on similarity of content.

Can you have duplicate content on your own website? ›

Duplicate content is just what it sounds like. It's when the same copy appears on two or more web pages. Duplicate content can occur on your own site or copy on another site you don't control.

How would you minimize duplicate content thin content risks on your site with pagination? ›

By making the relations between a series of pages clear to search engines using the pagination attributes, you give search engines context and prevent duplicate content.

What is it called when you copy the content of another website and make it your own? ›

Website cloning is the process of creating a replica of your existing website design or content to create a new website with ease.

How do I delete duplicate content? ›

Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates.

How much duplicate content is acceptable? ›

How Much Duplicate Content is Acceptable? According to Matt Cutts, 25% to 30% of the web consists of duplicate content. According to him, Google doesn't consider duplicate content as spam, and it doesn't lead your site to be penalized unless it is intended to manipulate the search results.

How do I find duplicate websites? ›

For checking entire websites for duplicate content, there is Siteliner. Simply paste your site's URL in the box and it will scan for duplicate content, page load time, the number of words per page, internal and external links, and much more.

How do I fix canonical issues? ›

There are two main ways to fix canonical issues on a website: by implementing 301 redirects, and/or by adding canonical tags to your site's pages to tell Google which of several similar pages is preferred. The right option depends on the canonical issue you're trying to resolve.

How do I fix duplicates without user selected canonical? ›

There are two different methods you can use to fix the 'Duplicate without user-selected canonical' status: Method 1: 301 Redirects. Method 2: Using Canonical URL.

Can you rank duplicate content? ›

In general, Google doesn't want to rank pages with duplicate content. In fact, Google states that: “Google tries hard to index and show pages with distinct information”. So if you have pages on your site WITHOUT distinct information, it can hurt your search engine rankings.

How do you prevent duplicate content on product pages? ›

Product Review Pages

To avoid this duplicate content, these “review pages” should either be canonicalized to the main product page or set to “noindex,follow” via meta robots or X-robots tag. We recommend the first method, just in case a link to a “review page” occurs on an external website.

How does Google identify duplicate content? ›

Google uses a predictive method to detect duplicate content based on URL patterns, which could lead to pages being incorrectly identified as duplicates. In order to prevent unnecessary crawling and indexing, Google tries to predict when pages may contain similar or duplicate content based on their URLs.

How would you minimize duplicate content thin content risks on your site with pagination? ›

By making the relations between a series of pages clear to search engines using the pagination attributes, you give search engines context and prevent duplicate content.

How do you stop content from showing up in the search results? ›

3 Ways To Hide Content From Search Engines
  1. Password Protection. Locking a website down with a password is often the best approach if you want to keep your site private. ...
  2. Block Crawling. Another way to stop Googlebot from access your site is by blocking crawling. ...
  3. Block Indexing.
24 Nov 2021

Why should you avoid duplicate content? ›

The Impact of Duplicate Content

Key pages unexpectedly not performing well in SERPs or experiencing indexing problems. Fluctuations or decreases in core site metrics (traffic, rank positions, or E-A-T criteria) Other unexpected actions by search engines as a result of confusing prioritization signals.

How does SEO check for duplicate content? ›

Using Google to check for Duplicate Content

One quick way to check if a page may be considered duplicate is by copying around ten words from the start of a sentence and then pasting it with quotes into Google. This is actually Google's recommended way to check.

What is content issue? ›

These issues relate to your content and how Google views it. Duplicate, thin, and slow content, for example, can negatively impact rankings. Content issues vary in severity, but resolving them may improve your site's ability to rank.

How much duplicate content is acceptable? ›

How Much Duplicate Content is Acceptable? According to Matt Cutts, 25% to 30% of the web consists of duplicate content. According to him, Google doesn't consider duplicate content as spam, and it doesn't lead your site to be penalized unless it is intended to manipulate the search results.

Can I have 2 websites with the same content? ›

Don't think you'll save time maintaining multiple websites by posting the same content on each site. Although this might seem like a great way to improve your SEO, search engines like Google see duplicate content as creating a poor user experience and as a result, they've adjusted their algorithms to penalize it.

How do I fix duplicates without user selected canonical? ›

There are two different methods you can use to fix the 'Duplicate without user-selected canonical' status: Method 1: 301 Redirects. Method 2: Using Canonical URL.

What is one of the ways duplication can occur on a ecommerce site? ›

The most blatant form of duplicate content on an e-commerce site is at the category level. This happens if multiple categories target the exact same type of product (for example, “Men's Boots” and “Boots for Men”).

Which SEO technique is used to display duplicate content on a webpage? ›

Black Hat SEO: Techniques that search engines do not approve of and those techniques which are used for optimizing the website are called Black Hat SEO. It mainly contains duplicate contents. Such websites are mainly used to redirect users to other websites and cause traffic.

How do you know if someone is searching for you on the Internet? ›

One clever way of working out who's googling you is through the company's Google Alerts feature. You won't be notified when somebody googles you, per se, but you will receive a notification whenever any website mentions you by name. To get started, head on over to Google Alerts.

Why is Google blocking my searches? ›

Google checks the pages that it indexes for malicious scripts or downloads, content violations, policy violations, and many other quality and legal issues that can affect users. When Google detects content that should be blocked, it can take the following actions: Hide search results silently.

Can you block someone from Googling your name? ›

Why Can't I Block My Name From Being Searched On Google? You can't block your name from appearing in Google searches simply due to the mechanics of search engines. Google, and search engines like Yahoo or Bing, are only an intermediary used to seek out information from across the Internet from third-party websites.

Videos

1. The 3 Major Issues With Duplicate Content | Clix
(Clix)
2. What is Duplicate Content? Why You Should AVOID??
(Dopinger)
3. How to Solve Duplicate Content And 404 Issues: The Complete Guide 2022
(Bangla Tutorial)
4. What Is Duplicate Content and Why Is It a Problem? - Ask an SEO Episode 7
(iloveseo)
5. How to Redirect Mirrored Domains to Prevent Duplicate Content Issues
(Wiideman Consulting Group)
6. Does It Trigger Duplicate Content Issues If The Google Site Has The Same Content As The Main Site?
(Semantic Mastery)
Top Articles
Latest Posts
Article information

Author: Allyn Kozey

Last Updated: 03/24/2023

Views: 6700

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Allyn Kozey

Birthday: 1993-12-21

Address: Suite 454 40343 Larson Union, Port Melia, TX 16164

Phone: +2456904400762

Job: Investor Administrator

Hobby: Sketching, Puzzles, Pet, Mountaineering, Skydiving, Dowsing, Sports

Introduction: My name is Allyn Kozey, I am a outstanding, colorful, adventurous, encouraging, zealous, tender, helpful person who loves writing and wants to share my knowledge and understanding with you.