De indexing is a process to remove webpage or group of web pages from Search engines.
There are two reasons for de-indexing.
1-You want to de-Index a page.
2-Google de-index your website / WebPages.
Why Would You Want to De-Index a Page from Search Engine?
Do you ever think that how search engines show results too quickly when you enter a keyword or a phrase into the search bar? Let’s check – write “SEO Services” and in a while, hundreds of results will be displayed. Agreeing on a popular belief, search engines aren’t crawling the websites while fetching these results, they are searching the relevant index/crawl websites in their database and showing the result.
Search engine crawlers crawl/index your website and save them in to search engine database. When a query is asked by searcher it will shows the result of relevant websites that are indexed. If a webpage page is not index, it will not display on SERPs, no matter what query is written into the search engine search bar.
Now the question aries why we need to hide / de-index the pages from search engines? We all want the maximum peoples to find our websites, the more indexed pages the more chances to get traffic, but that’s not always the situation, in some cases, it is best practice to block some WebPages for being crawled by search engine crawlers. If your website has any of the following issues, consider de-indexing:
- Outdated Content: It might be possible that your website has some blog post that is not up to date with today’s world, but you don’t want to delete them from your website because you have planned to update it later. So de-index it, keep them hidden from the search engine for now and later on index them once you will work on it.
- Admin Pages: De-index those pages where you don’t want any traffic and these WebPages are not for searchers/users, these pages are for website admins.
- Duplicate Content: Duplicate content violate Google guidelines. If your website has duplicate content, add canonical tags on those pages to avoid penalization. Canonical tags tell the search engines which pages are important. So if you have two or more pages with the same content, the search engine will consider pages having canonical tags and not crawl them.
- Pages With Little or No Content: Let’s say, a student is filling an enrollment form on your website. After submitting the form a new page appears that’s say ”we will contact you within 24 hours”. As this page has little content Google and other search engines find these types of pages not very useful. So in this case, don’t crawl these types of pages.
- Gated Content Pages: Are those types of pages that appear after performing a specific function on your website. Like, thank you page, confirmation page, and other pages.
Above are the few cases where de-indexing webpage is a good solution. Now you’re thinking “how to de-index already index/crawl pages or how to stop crawlers to index new WebPages”. Below are few ways that help you to de-index your webpages.
- Robot.txt
The website contains a robot.txt file that is used to communicate between the website and search engines. It tells search engine crawlers/bots that not to index/crawl these mentioned URL’s. To add Url in robot.txt file simply type “Disallow,” colon and add a space, and then write the relative URL. “Disallow: /relative-URL/”
Here’s an example of a website that has blocked multiple pages from indexing
- “NoIndex” Tag:
A “Noindex” tag is another way to tell search engine crawlers/bots to don’t index pages that include “no index” tag in HTML code. Below is the way how to write “no-index” tag in HTML code.
meta name=“robots” content=“noindex”
- Sitemap:
Sitemap.xml is a file that has a list of all Pages (URLs) present on a website. The search engine also checks website sitemap.xml file when indexing/crawling the webpages. URL present is sitemap have more chance to get index/crawl. You can submit your sitemap to Google through Google Search Console.
Join SEO Training Course and learn more about how to de-index webpages and which pages need to b de-index.
Reasons Why Google De-Index The Websites / Webpages
- Cloaking
Cloaking is a technique that use to show different content to search engines and different content to the user. For example, your website is one that shares a course outline but secretly links to pornographic content.
Cloaking is performed by an IP address or by a user agent. If the search engine crawler/spider crawl the website, the ‘clean’/’legit’ version of the site will display, but when a human visits a site the original content will display. The penalty given by Google for Cloaking Tactics is divided into 2 forms:
- Partial Penalty: This affect only specific part/webpages of your site.
- Sitewide Penalty: This affects your complete website.
Websites that show content to search engine but restrict it to users in accordance with Google’s First Click Free policy (not for the websites that require a signup or sign in to see the full content).
Images, such as images that are cover by another image, different from the images served and that redirect the user to a different image.
- Spam
Google eliminates all the websites/WebPages using tricks or strategies that are against its guidelines. According to Google, using any of these trick or strategy may result in your website being deindexed:
- Automatically generated content
- Participating in link schemes
- Sneaky redirects
- Hidden links
- Scraped content
- Involve in affiliate programs without adding enough value
- Creating pages with malicious behavior
- Sending automated queries to Google
- Free Hosting
Don’t trust the “free hosting” provides, There is no free web hosting!
Free web hosting means spammy ads and bad services that is against Google guidelines. Google has announced to take action on the “free hosting” scam.
To prevent this penalty, buy SEO-friendly hosting that is reliable and good sign form Google
- User-Generated Spam
This spam is developed by humans or bots that leave comments on forums, comment boxes, and user profiles. You can check out these spammy comments by considering unusual usernames/emails or content that unrelated to the topic.
To prevent from these spams, it’s a good idea to deleted all spam comments (for WordPress check out Akismet)
- Keyword Stuffing
Keyword stuffing is an SEO Tactic use to rank websites. It includes adding keywords multiple time on a WebPages.
The keyword stuffing tactics include:
- The addition of unrelated keywords. For example: your website is about Digital marketing services but you add irrelevant keywords related to blogging in the hopes it gets more traffic. No this is not a good SEO practice and Google will take action on this type of Practices.
- Repetition of keywords. Google is aware of the unnecessary repetition of keywords. This also includes using a different variations of keywords. For example: We sell leather women purses. Our leather women purses are perfect for ladies. If you’re thinking of buying leather women purses, don’t hesitate to call our leather women purses shop.
- Thin Content
As many content writers and SEO copywriters can tell you from experience, creating optimized quality content is no easy task!
In SERPs Google try to display the most relevant content that matches the user queries.
Many SEO and website owners take the shortcut and copy high ranking websites content (also known as duplicate content) and use Automatically content generating software to rewrite it (commonly known as spinning articles)
Google defined thin content as little or no original content. Having thin content can result in your site deindexing.
- Duplicate Content
Duplicate content is when you copy content from other websites and publish it on your own website. Reusing other website content on your website does not means duplicate content only, but using unique content on multiple webpages of the same domain also called duplicate content.
Here is Google advice to fix your duplicate content issues.
- Doorway Pages
Doorway pages (commonly known as jump pages, bridge pages, gateway pages, and portal pages,). Basically Doorway pages are those pages created to rank for specific search queries.
In Google Point of view doorway pages can lead to different similar pages in the search results, where each page is interlinked to the first page.
Example: Blog pages that are created for specific keywords, queries. Which contain specific contact and interlink with the main page for details. This trick confused the user as well where users pass through different pages when they will be directed to the same main page.
- Unnatural Links to/from Your Website
A key component of SEO is an Off-page SEO strategy that includes link building.
Link building is essential and plays an important role to rank a website because link building helps Google to provide website authority, The more and quality links the more chances to rank a website.
However, many SEO and website marketing submit their website link to irrelevant sites in a bulk quality, which may result in de-index their website.
Hopefully, this article is helpful for you to understand why web marketers need to deindex some webpages from Google as well as the reasons why Google deindex webpages/websites.