: HTTP 303 redirection and robots.txt On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information

Posted in: #303Redirect #Googlebot #RobotsTxt

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains:

Disallow: /doc

However, we do want the non-redirected pages under /id to get indexed by Google et al:

Allow: /id

So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt?

If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

10.01% popularity Vote Up Vote Down

: Interspire Email Marketer Configuration I'm trying to set up Interspire's Email Marketer for a client. I have a working configuration of it on the VPS but this one is having issues! I have

@Rambettina238

Posted in: #BulkEmail #Marketing #Mysql #Php

0 Comments

: Redirecting broken links (from external site) In webmaster tools we found that a bunch of sites are linking to non existent pages on our site. So far I can see 50,000+ links. Is there any

@Rambettina238

Posted in: #Redirects #Seo

1 Comments

: Best tools to build a ecommerce website First I would like to apologize if this is not the forum to be asking this question. If you can point me to the right forum, I will post my question

@Rambettina238

Posted in: #WebDevelopment #WebsiteDesign

1 Comments

: Need to sanity-check my .htaccess, especially Limit GET POST line for Google repellent I need a sanity check on this .htaccess (from a WordPress site) I inherited from a 5 month+ old site. What's

@Rambettina238

Posted in: #Htaccess #RobotsTxt #WebCrawlers #Wordpress

1 Comments

Login to post a comment!

1 Comments

Sorted by latest first Latest Oldest Best

@Connie744

So the question I have, which I can't find an answer to so far, is: if
an allowed /id page 303-redirects to a /doc page, will it still be
blocked by robots.txt?

From a crawlbot or search engine that obeys robots.txt: Yes.

If I put a link to your.com/id or your.com/doc on my own website, Google will crawl it, follow redirect, read your robots.txt and disallow it from being indexed.

10% popularity Vote Up Vote Down

Feed

: HTTP 303 redirection and robots.txt On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information

More posts by @Rambettina238

: Interspire Email Marketer Configuration I'm trying to set up Interspire's Email Marketer for a client. I have a working configuration of it on the VPS but this one is having issues! I have

: Redirecting broken links (from external site) In webmaster tools we found that a bunch of sites are linking to non existent pages on our site. So far I can see 50,000+ links. Is there any

: Best tools to build a ecommerce website First I would like to apologize if this is not the forum to be asking this question. If you can point me to the right forum, I will post my question

: Need to sanity-check my .htaccess, especially Limit GET POST line for Google repellent I need a sanity check on this .htaccess (from a WordPress site) I inherited from a 5 month+ old site. What's

Login to post a comment!

1 Comments

Back to top | Use Dark Theme