Mobile app version of vmapp.org
Login or Join
Sherry384

: Robots meta not blocking indexing We have a staging version of our website to test changes on at trailheadpaddleshack.ca/staging1. This never appeared in search before. Recently the staging site

@Sherry384

Posted in: #Indexing #MetaTags #RobotsTxt

We have a staging version of our website to test changes on at trailheadpaddleshack.ca/staging1. This never appeared in search before. Recently the staging site has appeared on Google and is affecting our search results.

I'm trying to figure out how the pages got there and how to remove them. The pages have always had <meta name="robots" content="noindex, nofollow"> in the head.

I am kinda new at this but was under the impression this should prevent Google from showing my site in results. I am pretty sure the results appeared in google after I accidently copy pasted some codes from the staging site to the live site that contained links to pages on the staging site. If anyone can point me in the right direction to figure out what happened and prevent it from happening again would be much appreciated.

robots.txt looks like this:

User-agent: *
Disallow: /calendar-2/action~posterboard/
Disallow: /calendar-2/action~agenda/
Disallow: /calendar-2/action~oneday/
Disallow: /calendar-2/action~month/
Disallow: /calendar-2/action~week/
Disallow: /calendar-2/action~stream/ #Begin Attracta SEO Tools Sitemap. Do not remove
sitemap: cdn.attracta.com/sitemap/4035112.xml.gz #End Attracta SEO Tools Sitemap. Do not remove


I also tried adding an X-Robots-Tag header and submited the site to be re-crawled. Did that a few days ago and I still see no changes. Here are the HTTP headers according to "Fetch as Google":

HTTP/1.1 200 OK
X-Robots-Tag: noindex,nofollow
Vary: Accept-Encoding
Transfer-Encoding: chunked
Date: Sun, 24 May 2015 16:26:49 GMT
Server: LiteSpeed
Connection: close
X-Pingback: trailheadpaddleshack.ca/staging1/xmlrpc.php Content-Type: text/html; charset=UTF-8
Link: <http://trailheadpaddleshack.ca/staging1/?p=170>; rel=shortlink


I am now faced with a bunch of results I need to remove from Google asap as they contain out of date information and are affecting our search results. Webmaster Tools has something for removal of a single URL but I am looking to remove the entire /staging1/ subfolder. Any tips?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Sherry384

2 Comments

Sorted by latest first Latest Oldest Best

 

@Alves908

As I mentioned in my comment above, if the robots meta tag has been on these pages all the time AND the pages are not blocked by robots.txt (which prevents crawling, but not indexing) then these pages should not get indexed.

Digging a little deeper, these pages do appear to be fully "indexed", with a complete description in the SERPs. So they have not been blocked by robots.txt and this is consistent with the robots.txt file in the question, which does not block /staging1. And the live pages do indeed have a noindex robots meta tag.

However, checking the Google cache of these pages in the SERPs reveals the problem: There is no robots meta tag! So, you would seem to have experienced a temporary "glitch" about a month ago (the Google cache shows dates of 15 and 21 April) that resulted in the robots meta not being output in the page as it should have been. Consequently Google indexed the pages!


I also tried adding an X-Robots-Tag header and submited the site to be re-crawled. Did that a few days ago and I still see no changes.


That's the right idea, but it seems you'll need to wait more than a few days. As I mentioned above, the cached pages in the SERPs are over a month old - so that suggests that these pages have not been recrawled yet, or Google has simply not updated it's index.

10% popularity Vote Up Vote Down


 

@Megan663

I've seen this before and I think it's caused by a vicious circle!

If you are blocking pages from being crawled by Google in robots.txt then Google cannot access the page to see the NOINDEX tag, so the pages will not be removed from the index if they had already got indexed

Blocking pages in robots.txt will stop Google crawling them, but it won't stop them getting indexed. If Google finds them linked elsewhere, they can still get indexed.

But where did Google find the links, well that's another topic entirely!

But if you are using NOINDEX tag and blocking them in robots.txt, the pages can still appear in the SERPS, as closetnoc mentioned usually with message saying


'A description for this result is not available because of this site's
robots.txt – learn more.'


The sure fire way to guarantee Google doesn't incliude your URLs in the SERPs is password protecting the directory they are in

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme