: Problem with my robots.txt I want to allow a certain html file and my site's index file to be indexed by search engines. Everything else should be disallowed. My home directory does not actually
I want to allow a certain html file and my site's index file to be indexed by search engines. Everything else should be disallowed. My home directory does not actually contain an index file, I am using .htaccess to redirect to /cgi-bin/index.cgi. I am currently using this:
User-agent: *
Allow: /cgi-bin/index.cgi
Allow: /contact.html
Disallow: /
However, google webmaster tools is saying:
Googlebot is blocked from mydomain.com/
Is there a way of allowing indexing of the root while blocking all other files i.e., mydomain.com/*
More posts by @Kaufman445
2 Comments
Sorted by latest first Latest Oldest Best
As suggested by Pekka you may want to try to place the Allow directives after the Disallow directives.
But given the differences in interpretations between Google, Bing and others, you may want to use a robots meta tag instead. This will be safer and more granular.
In your disallowed pages:
<meta name="robots" content="noindex" />
In your allowed pages:
<meta name="robots" content="index" />
(to be placed in your <head> tag)
See googlewebmastercentral.blogspot.com/2007/03/using-robots-meta-tag.html
Maybe try it the other way round, put the disallow before the allow.
If the Wikipedia article on robots.txt is correct, it should work:
While by standard implementation the first matching robots.txt pattern always wins, Google's implementation differs in that Allow patterns with equal or more characters in the directive path win over a matching Disallow pattern.[8] Bing uses the Allow or Disallow directive which is the most specific.[9]
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.