: Robots meta not blocking indexing We have a staging version of our website to test changes on at trailheadpaddleshack.ca/staging1. This never appeared in search before. Recently the staging site
We have a staging version of our website to test changes on at trailheadpaddleshack.ca/staging1. This never appeared in search before. Recently the staging site has appeared on Google and is affecting our search results.
I'm trying to figure out how the pages got there and how to remove them. The pages have always had <meta name="robots" content="noindex, nofollow"> in the head.
I am kinda new at this but was under the impression this should prevent Google from showing my site in results. I am pretty sure the results appeared in google after I accidently copy pasted some codes from the staging site to the live site that contained links to pages on the staging site. If anyone can point me in the right direction to figure out what happened and prevent it from happening again would be much appreciated.
robots.txt looks like this:
User-agent: *
Disallow: /calendar-2/action~posterboard/
Disallow: /calendar-2/action~agenda/
Disallow: /calendar-2/action~oneday/
Disallow: /calendar-2/action~month/
Disallow: /calendar-2/action~week/
Disallow: /calendar-2/action~stream/ #Begin Attracta SEO Tools Sitemap. Do not remove
sitemap: cdn.attracta.com/sitemap/4035112.xml.gz #End Attracta SEO Tools Sitemap. Do not remove
I also tried adding an X-Robots-Tag header and submited the site to be re-crawled. Did that a few days ago and I still see no changes. Here are the HTTP headers according to "Fetch as Google":
HTTP/1.1 200 OK
X-Robots-Tag: noindex,nofollow
Vary: Accept-Encoding
Transfer-Encoding: chunked
Date: Sun, 24 May 2015 16:26:49 GMT
Server: LiteSpeed
Connection: close
X-Pingback: trailheadpaddleshack.ca/staging1/xmlrpc.php Content-Type: text/html; charset=UTF-8
Link: <http://trailheadpaddleshack.ca/staging1/?p=170>; rel=shortlink
I am now faced with a bunch of results I need to remove from Google asap as they contain out of date information and are affecting our search results. Webmaster Tools has something for removal of a single URL but I am looking to remove the entire /staging1/ subfolder. Any tips?
More posts by @Sherry384
2 Comments
Sorted by latest first Latest Oldest Best
As I mentioned in my comment above, if the robots meta tag has been on these pages all the time AND the pages are not blocked by robots.txt (which prevents crawling, but not indexing) then these pages should not get indexed.
Digging a little deeper, these pages do appear to be fully "indexed", with a complete description in the SERPs. So they have not been blocked by robots.txt and this is consistent with the robots.txt file in the question, which does not block /staging1. And the live pages do indeed have a noindex robots meta tag.
However, checking the Google cache of these pages in the SERPs reveals the problem: There is no robots meta tag! So, you would seem to have experienced a temporary "glitch" about a month ago (the Google cache shows dates of 15 and 21 April) that resulted in the robots meta not being output in the page as it should have been. Consequently Google indexed the pages!
I also tried adding an X-Robots-Tag header and submited the site to be re-crawled. Did that a few days ago and I still see no changes.
That's the right idea, but it seems you'll need to wait more than a few days. As I mentioned above, the cached pages in the SERPs are over a month old - so that suggests that these pages have not been recrawled yet, or Google has simply not updated it's index.
I've seen this before and I think it's caused by a vicious circle!
If you are blocking pages from being crawled by Google in robots.txt then Google cannot access the page to see the NOINDEX tag, so the pages will not be removed from the index if they had already got indexed
Blocking pages in robots.txt will stop Google crawling them, but it won't stop them getting indexed. If Google finds them linked elsewhere, they can still get indexed.
But where did Google find the links, well that's another topic entirely!
But if you are using NOINDEX tag and blocking them in robots.txt, the pages can still appear in the SERPS, as closetnoc mentioned usually with message saying
'A description for this result is not available because of this site's
robots.txt – learn more.'
The sure fire way to guarantee Google doesn't incliude your URLs in the SERPs is password protecting the directory they are in
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.