: How do you configure robots.txt to allow crawling of the site except for a few directories? What is the best initial or general setup for the robots.txt to allow search engines to go through
What is the best initial or general setup for the robots.txt to allow search engines to go through the site, but maybe restrict a few folders?
Is there a general setup that should always be used?
More posts by @BetL925
4 Comments
Sorted by latest first Latest Oldest Best
The best configuration, if you don't have any special requirements, is nothing at all. (Although you may at least want to add a blank file to avoid 404s filling up your error logs.)
To block a directory on the site, use the 'Disallow' clause:
User-agent: *
Disallow: /example/
There is also an 'Allow' clause which overrides previous 'Disallow' clauses. So if you've disallowed the 'example' folder you may wish to allow a folder like 'example/foobar'.
Remember that robots.txt doesn't prevent anyone visiting those pages if they want to, so if some pages should remain secret you should hide them behind some kind of authentication (i.e. a username/password).
The other directive that is likely to be in many robots.txt files is 'Sitemap', which specifies the location of your XML sitemap if you have one. Put it on a line on its own:
Sitemap: /sitemap.xml
The official robots.txt site has lots more information on the various options. But in general, the vast majority of sites will need very little config.
You can use google webmaster tool to do this. Google webmaster tool is very helpful to create robot.txt
Google Webmaster tools has a Section called "Crawler access"
This section allows you very easily to create your robots.txt
For example to allow everything except blog a folder called test your robot.txt would look something like
User-agent: *
Disallow: /Test
Allow: /
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.