: Google is indexing pages with a "noindex" robots meta tag I have come across a very annoying problem. I have several pages that shouldn't be indexed, as they are basically pop-up pages and
I have come across a very annoying problem.
I have several pages that shouldn't be indexed, as they are basically pop-up pages and "thanks for signing up" pages.
I have set them up with the robots meta tag noindex,nofollow - but for some reason Google list them anyway.
Try a google search for "en-til-en-mindfulness-coaching/referencer-popup" and it will turn up in your results as "Se flere referencer - MindfulSolutions". But if you look in the header section you will see that it should not have been indexed.
Why does Google do that and how can I prevent it?
More posts by @Radia820
2 Comments
Sorted by latest first Latest Oldest Best
As tillinberlin hints at, the reason this page is appearing in the search results is because of your "robots.txt" file, however, not for the reasons given. Basically, your robots.txt file is blocking that URL from being crawled, so Google is unable to see the robots meta tag that prevents the page from being indexed.
As stated in the (Google) search results for that page:
A description for this result is not available because of this site's robots.txt
That particular page is blocked by your robots.txt, because the indexed URL contains a ?. The last rule in your robots.txt blocks any URL containing a ?:
Disallow: /*?*
"robots.txt" blocks your pages from being crawled - not from being indexed. If they are linked to they can still get indexed (a link-only result with no description - which is what you are seeing here).
A robots "noindex" meta tag (like you have) prevents the page from being indexed. However, if Google is unable to crawl the page, Google is unable to see the robots meta tag!
Google (and other "good" search engines / bot) will honour your robots directives if correctly implemented, however, other "bad" bots could still do anything, since the resources are publicly available.
Short answer: robots.txt is a recommendation search engines may cherish – but they don't have to. So whatever you intend to do – don't rely on robots.txt – same applies to robots meta tags.
If you really want those pages not indexed / not to be opened through search engine result pages, then you should probably add a 301 redirect or the like for everybody that is opening the page not through your page.
ps: the website robotstxt.org has more details on robots meta tags: About the Robots tag
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.