Mobile app version of vmapp.org
Login or Join
Vandalay111

: Google Indexing Pages with noindex meta tag I got a message in Google Webmaster Tools about "Googlebot found an extremely high number of URLs on your site" with a long list of example URLs.

@Vandalay111

Posted in: #GoogleSearchConsole #Noindex #RobotsTxt #SearchEngines #Seo

I got a message in Google Webmaster Tools about "Googlebot found an extremely high number of URLs on your site" with a long list of example URLs.

For some of the pages listed, there was nothing to prevent them being indexed. However, for some of the URLs, I have "noindex, nofollow" meta tags as follows:

<meta name="googlebot" content="noindex, follow" />
<meta name="bingbot" content="noindex, follow" />
<meta name="msnbot" content="noindex, follow" />
<meta name="slurp" content="noindex, follow" />
<meta name="teoma" content="noindex, follow" />


I have read that Google will sometimes index pages that you block in robots.txt if they find another link that points to the page, but supposedly they respect meta tags?

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Vandalay111

4 Comments

Sorted by latest first Latest Oldest Best

 

@Candy875

Don't add nofollow to your noindex, as you want PageRank to flow through those pages to others normally; you just don’t want them indexed.

Therefore, on the pages you don't want indexed, just add <meta name="robots" content="noindex, follow"> to the <head> section.

Make sure to remove your robots.txt exclusions, as with them present, the meta won't be seen, and the pages won't be removed.

An alternate, slightly more convoluted method, is to exclude the pages in robots.txt, and use Google URL Removal Tool in Webmaster Tools. Note that the Robot Exclusion Standard will only prevent indexing if you also perform the URL removals using Google's tool, and not undo them. If they're undone in your Webmaster Tools account, the pages might become indexed again, if, for example, they're discovered from a source other than your site.

10% popularity Vote Up Vote Down


 

@Smith883

The Webmaster Tools message "Googlebot found an extremely high number of URLs on your site" just tells you that Googlebot found those URLs and is crawling these. There is a URL at Google explaining it a bit and showing examples like "calender pages" that go on for ever.

With your meta tag you just tell Google to read the page and throw away the content noindex - but to follow all links on that page. If you do not want Google to follow those links you should use nofollow instead.

Setting the disallow in the robots.txt will prevent the bot from accessing your website, but not from keeping your (linked) pages in the search index.

10% popularity Vote Up Vote Down


 

@Radia820

It's important to note that nofollow, noindex and even blocking via robots doesn't necessary mean that the content won't be crawled, in fact these pages can still be indexed but rather hidden from public search results (Yes Google is naughty, but it true). You see when using noindex on the page Google needs to crawl the page to find that tag out, Googlebot does not just process one line at a time and then stops when it hits that tag, it downloads the entire page so that's why its most likely being reported in Google Webmaster Tools.

So you may see these pages within Webmaster Tools but it doesn't mean they are included within the actual search results, simply do a site:yourdomain.com within Google search and see if those pages are found, which I suspect they are not or somehow the tag got ignored.

Google actually recommends both meta name and robots to block content being made available in the public search results. Also you should not need to use botname within the meta name and a simple "robots" should do the trick.

Your meta name should look like this:

<meta name="robots" content="noindex, nofollow">


And you should make a robots.txt like so

User-agent: *
Disallow: /foldername/

10% popularity Vote Up Vote Down


 

@Welton855

That looks like it should be correct, per instructions from Google. A few suggestions:


Make sure the meta tags are within the <head> tag
Make sure the meta tags actually say "noindex, nofollow" (your code says "noindex, follow" - not sure if that's just a copying mistake)
Use the standard <meta name="robots" content="noindex, nofollow"> instead of trying to keep up with the different search engines
Wait for Google to crawl your pages again, if you've only recently added/changed the meta tags, or use the URL removal request to try to expedite the removal of some URLs.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme