: How do you configure robots.txt to allow crawling of the site except for a few directories? What is the best initial or general setup for the robots.txt to allow search engines to go through

What is the best initial or general setup for the robots.txt to allow search engines to go through the site, but maybe restrict a few folders?

Is there a general setup that should always be used?

10.04% popularity Vote Up Vote Down

: Is there any good authoritative source of information on SEO practices that is backed up by data? Possible Duplicate: What are the best ways to increase your site's position in Google?

@BetL925

Posted in: #Google #Seo

4 Comments

: Proper sitemap.xml setup I have a dynamic site which has many (well, less than 50) users. Each user is allowed to create as many pages as they want. I know that there is a limit to how

@BetL925

Posted in: #Seo #XmlSitemap

1 Comments

: How do I ensure that my content is sent via gzip in Apache? Is there some setting that I can search for in Apache configurations to make sure that all pages, CSS, and JavaScript are sent

@BetL925

Posted in: #Apache #Gzip

3 Comments

Login to post a comment!

4 Comments

Sorted by latest first Latest Oldest Best

@Yeniel560

The best configuration, if you don't have any special requirements, is nothing at all. (Although you may at least want to add a blank file to avoid 404s filling up your error logs.)

To block a directory on the site, use the 'Disallow' clause:

User-agent: *
Disallow: /example/

There is also an 'Allow' clause which overrides previous 'Disallow' clauses. So if you've disallowed the 'example' folder you may wish to allow a folder like 'example/foobar'.

Remember that robots.txt doesn't prevent anyone visiting those pages if they want to, so if some pages should remain secret you should hide them behind some kind of authentication (i.e. a username/password).

The other directive that is likely to be in many robots.txt files is 'Sitemap', which specifies the location of your XML sitemap if you have one. Put it on a line on its own:

Sitemap: /sitemap.xml

The official robots.txt site has lots more information on the various options. But in general, the vast majority of sites will need very little config.

10% popularity Vote Up Vote Down

@Alves908

You can use google webmaster tool to do this. Google webmaster tool is very helpful to create robot.txt

10% popularity Vote Up Vote Down

@Si4351233

Here's everything you need to know about the robots.txt file

10% popularity Vote Up Vote Down

@Kristi941

Google Webmaster tools has a Section called "Crawler access"

This section allows you very easily to create your robots.txt

For example to allow everything except blog a folder called test your robot.txt would look something like

User-agent: *
Disallow: /Test
Allow: /

10% popularity Vote Up Vote Down

Feed

: How do you configure robots.txt to allow crawling of the site except for a few directories? What is the best initial or general setup for the robots.txt to allow search engines to go through

More posts by @BetL925

: Is there any good authoritative source of information on SEO practices that is backed up by data? Possible Duplicate: What are the best ways to increase your site's position in Google?

: Proper sitemap.xml setup I have a dynamic site which has many (well, less than 50) users. Each user is allowed to create as many pages as they want. I know that there is a limit to how

: How do I ensure that my content is sent via gzip in Apache? Is there some setting that I can search for in Apache configurations to make sure that all pages, CSS, and JavaScript are sent

Login to post a comment!

4 Comments

Back to top | Use Dark Theme