: HTTP 303 redirection and robots.txt On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information
On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains:
Disallow: /doc
However, we do want the non-redirected pages under /id to get indexed by Google et al:
Allow: /id
So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt?
If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.
More posts by @Rambettina238
1 Comments
Sorted by latest first Latest Oldest Best
So the question I have, which I can't find an answer to so far, is: if
an allowed /id page 303-redirects to a /doc page, will it still be
blocked by robots.txt?
From a crawlbot or search engine that obeys robots.txt: Yes.
If I put a link to your.com/id or your.com/doc on my own website, Google will crawl it, follow redirect, read your robots.txt and disallow it from being indexed.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.