: Google Webmaster Tools: Robots disallow does not seem to be working for staging site I've added robots.txt on my staging site at staging.mydomain.com as: User-agent: * Disallow: / I then added

Posted in: #GoogleSearchConsole #RobotsTxt

I've added robots.txt on my staging site at staging.mydomain.com as:

User-agent: *
Disallow: /

I then added and verified the staging site in Google Webmasters Tools.
In Crawl > Blocked URLs, I can see robots.txt listed with the status as 200(Success).
Further down that page, when I clicked on Test button to test staging.mydomain.com/, it gives me the result as:

Allowed
Detected as a directory; specific files may have different restrictions

Looks like this is the wrong result. What have I done wrong? Do I have to wait for some time to have google read robots.txt?

Within the staging site, I have other folders such as:
staging.mydomain.com/test1/ http://staging.mydomain.com/test2/

Obviously I want to disallow indexing all of these. When I do a test for these folders, the result shows up as Allowed. Do I need to add robots.txt within each of the sub-directories?

10.01% popularity Vote Up Vote Down

: How to cache on CloudFlare images that are served to client as JSON? I am using a gallery on my website that gets list of images from a JSON sent by a php script. So, the javascript gallery

@Rivera981

Posted in: #Cache #Cdn #Cloudflare

1 Comments

: Does Fetch as Googlebot still support their ajax-crawling proposal? I spent half a day implementing the server side html generation for modal pages based on their proposal (link), but it seems

@Rivera981

Posted in: #Ajax #FetchAsGoogle #Googlebot #GoogleSearchConsole #Seo

1 Comments

: How do you tell search engines not to index this page just yet, but maybe in the future? The company I work for has a content management system that builds pages automatically for certain

@Rivera981

Posted in: #Seo

4 Comments

: Does Google index images in anything other than an img tag? I was thinking of making a 'load on scroll' image gallery plugin, i.e. I do not want the image to load until it is in view in

@Rivera981

Posted in: #Html #Images #Seo

1 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Fox8124981

Looks like what done is perfectively fine.

A typical robots.txt for a production site might be as simple as:

User-agent: *
Disallow:

This is the least restrictive. It says that all crawlers are allowed to crawl the entire site.

For our dev or staging site, we want to use the following:

User-agent: *
Disallow: /

This requests that the entire site not be crawled.

But some time if you didn't take proper precautions prior to creating your dev or staging sites, there's a good chance that the search engines found your work-in-progress.

What now? Well, let's be careful here.

1.First, understand that search engines will cache your site for a certain length of time.

2.Second, you'll need to keep in mind that restricting crawling of your site does not mean that existing indexed pages will disappear from search engine results.

If you find your staging site pages in search results, it's a good idea to go ahead and tell search engines not to index each page. The best way to is to add a "noindex" meta tag to all your pages. The noindex tag looks like this:

<meta name="robots" content="noindex" />

OR :

Advised Approach:

1.Add Authentication (HTTP or otherwise) infront of requests.

2.Respond with appropriate response code if not permitted (e.g. 401 Unauthorized).

3.Everything else in the Basic Approach above.

By adding a robots.txt it prevents search engines from accessing and indexing the content. However, that doesn't mean they won't index the URL. If a search engine knows about a given URL, it may add it to the search result index. You'll sometimes see these in the search results. The title tends to be the URL with not description. To prevent this from happening, the search engines need to be told not to show the content or URLs. By adding Authentication infront and not responding with a 200 OK status code it is a strong signal to the engines not to add these URLs to their index. From my experience I haven't ever seen a 401 response code page listed in a search engine index.

10% popularity Vote Up Vote Down

Feed

: Google Webmaster Tools: Robots disallow does not seem to be working for staging site I've added robots.txt on my staging site at staging.mydomain.com as: User-agent: * Disallow: / I then added

More posts by @Rivera981

: How to cache on CloudFlare images that are served to client as JSON? I am using a gallery on my website that gets list of images from a JSON sent by a php script. So, the javascript gallery

: Does Fetch as Googlebot still support their ajax-crawling proposal? I spent half a day implementing the server side html generation for modal pages based on their proposal (link), but it seems

: How do you tell search engines not to index this page just yet, but maybe in the future? The company I work for has a content management system that builds pages automatically for certain

: Does Google index images in anything other than an img tag? I was thinking of making a 'load on scroll' image gallery plugin, i.e. I do not want the image to load until it is in view in

Login to post a comment!

1 Comments

Back to top | Use Dark Theme