Mobile app version of vmapp.org
Login or Join
Rambettina238

: HTTP 303 redirection and robots.txt On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information

@Rambettina238

Posted in: #303Redirect #Googlebot #RobotsTxt

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains:

Disallow: /doc


However, we do want the non-redirected pages under /id to get indexed by Google et al:

Allow: /id


So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt?

If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Rambettina238

1 Comments

Sorted by latest first Latest Oldest Best

 

@Connie744

So the question I have, which I can't find an answer to so far, is: if
an allowed /id page 303-redirects to a /doc page, will it still be
blocked by robots.txt?


From a crawlbot or search engine that obeys robots.txt: Yes.

If I put a link to your.com/id or your.com/doc on my own website, Google will crawl it, follow redirect, read your robots.txt and disallow it from being indexed.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme