Mobile app version of vmapp.org
Login or Join
Gretchen104

: Googlebot follows rel=“nofollow” links I would like to hide all pages of our website from being read or indexed by Googlebot, except several thousand of pages which were selected for indexing.

@Gretchen104

Posted in: #Google #Googlebot #Nofollow #Noindex

I would like to hide all pages of our website from being read or indexed by Googlebot, except several thousand of pages which were selected for indexing.

The site was launched in April 2015. The total number of pages on the site tends to infinity because the content is generated dynamically and based on parameters selected by the end user.

I've selected 128,000 pages for indexing; these pages are listed in sitemaps.
All other pages contain the meta tag <meta name="robots" content="noindex"/>, and all links to these pages are tagged with rel="nofollow".

But during the first two months, these tags (noindex, rel="nofollow") had not been included in the pages. Today, the total number of indexed pages exceeds 4 million.

Googlebot continues to read unwanted pages, and according to server stats it seems that Googlebot ignores "nofollow" attributes. Why does it happen? What should be done to stop it reading and indexing these unwanted pages?

The total number of indexed pages is still increasing.

How can I request a re-scan of the previously indexed pages (so that Googlebot can re-read <meta name="robots" content="noindex" />)?

Example of pages that should be indexed: example.com/en/indicators/GDP_current_prices/Philippines
Pages, that should not be indexed: example.com/en/indicators/GDP_current_prices/India-Philippines/

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Gretchen104

1 Comments

Sorted by latest first Latest Oldest Best

 

@Si4351233

rel="nofollow" is not meant to prevent indexing of the linked page as that is the way that Google locates new pages on the internet. All it does is tell Google not to pass link juice to that linked page in an attempt to mitigate link spamming. The only way to block those old pages from Google would be to add each one to your robots.txt file as a disallowed page then give it a few weeks until Google reindexes your whole site and sees the robots.txt file. Once it see's that it will remove the pages from the index. The only other way is to manually remove each and every page one at a time from the index using a web form but that only keeps them off for i believe 2 months then they can be reindexed again and added to the Google index for searching again. robots.txt is the standard for excluding indexing of certain pages.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme