Mobile app version of vmapp.org
Login or Join
Sue5673885

: Is it possible to block an entire site to Google then list exceptions? I usually disallow subdirectories in the robots.txt file and I was wondering if it's possible to do it the other way:

@Sue5673885

Posted in: #Google #Seo #WebCrawlers

I usually disallow subdirectories in the robots.txt file and I was wondering if it's possible to do it the other way: block everything and list the main index file and the others pages I'd like search engines to index. Is that possible?

My current robots.txt is as follows:

User-agent: *
Disallow: /example/
Disallow: /example/
Disallow: /example/
Disallow: /example/
Disallow: /example/
Disallow: /example/
Disallow: /example/
Disallow: yea.html
Allow: /

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Sue5673885

1 Comments

Sorted by latest first Latest Oldest Best

 

@Jessie594

Yes. You can disallow everything first, then allow the folders you want to be indexed.

User-agent: *
Disallow: /
Allow: /index.html
Allow: /example/
Allow: /example2/
Allow: /example3/


The reason this works is that Google (and Bing) follow CSS-style specificity rules when reading robots.txt files, where more specific rules take precedence over less specific ones. From Google's robots.txt documentation:


"...for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined."


They give a table of examples on that page. Note that the order of the rules makes no difference to how they're interpreted; only the length of them matters. You could put the 'Disallow' rule in my example above at the end of the file and it would still work as intended.

Don't forget to test your robots.txt file using Google Webmaster Tools:


To test a site's robots.txt file:


On the Webmaster Tools Home page, click the site you want.
Under Site configuration, click Crawler access
If it's not already selected, click the Test robots.txt tab.
Copy the content of your robots.txt file, and paste it into the first box.
In the URLs box, list the site to test against.
In the User-agents list, select the user-agents you want.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme