: Robots.txt: block all webpages except a few number of webpages? I have a few doubts regarding robots.txt. Say, my domain is stackoverflow.com, A) Will the code below do the following for all

I have a few doubts regarding robots.txt. Say, my domain is stackoverflow.com,

A) Will the code below do the following for all the crawlers?

User-agent: *

Disallow: /

Allow: /$
Allow: /a/$
Allow: /a/login.php
Allow: /a/login.php?return=/pligg/

Accepting stackoverflow.com/ will accept stackoverflow.com too?
Accepting stackoverflow.com/a/ Accepting stackoverflow.com/a/login.php Accepting stackoverflow.com/a/login.php?return=/pligg/ Not accepting any other page on stackoverflow.com

B) Which is right: robots.txt or robot.txt?

10.02% popularity Vote Up Vote Down

: Do all websites in a shared hosting have the same IP address? If a shared webhost server like hostgator (for example) host 100 sites, will all the 100 sites have the same IP address?

@Cofer257

Posted in: #SharedHosting

1 Comments

: Correct usage of robots.txt file ("Googlebot-Image" and "Mediapartners-Google") This is the current robots.txt file I am using on a site of mine. I have excluded large parts of the IP.Board forum

@Cofer257

Posted in: #RobotsTxt #UserAgent

2 Comments

: How does the Google Images preview panel work, and why is it not showing my image? In Google Images, when an image is clicked in search results, an "image preview" appears with a blurry image

@Cofer257

Posted in: #GoogleImageSearch #Images #Seo

4 Comments

: When using CloudFlare your domain would resolve to 2 CloudFlare IPs. CloudFlare is a reverse proxy, so once you point to our DNS - the domain will resolve to our IP addresses, and then within

@Cofer257

0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Ogunnowo487

Your robots.txt is invalid. Line breaks are not allowed in a record. So it should look like:

User-agent: *
Disallow: /
Allow: /$
Allow: /a/$
Allow: /a/login.php
Allow: /a/login.php?return=/pligg/

Will the code below do the following for all the crawlers?

No, your robots.txt won’t work that way for all crawlers.

Allow is not part of the original robots.txt specification. Only some parsers will understand it (and they might have implemented the wildcards differently), all other parsers will ignore the Allow lines.

10% popularity Vote Up Vote Down

@BetL925

A) Yes for the most important ones (Googlebot...).

Yes

B) The right file name is robots.txt.

Otherwise, the right code to do what you want is the following:

User-agent: *

Disallow: /

Allow: /a/$
Allow: /a/login.php
Allow: /a/login.php?return=/pligg/

10% popularity Vote Up Vote Down

Feed

: Robots.txt: block all webpages except a few number of webpages? I have a few doubts regarding robots.txt. Say, my domain is stackoverflow.com, A) Will the code below do the following for all

More posts by @Cofer257

: Do all websites in a shared hosting have the same IP address? If a shared webhost server like hostgator (for example) host 100 sites, will all the 100 sites have the same IP address?

: Correct usage of robots.txt file ("Googlebot-Image" and "Mediapartners-Google") This is the current robots.txt file I am using on a site of mine. I have excluded large parts of the IP.Board forum

: How does the Google Images preview panel work, and why is it not showing my image? In Google Images, when an image is clicked in search results, an "image preview" appears with a blurry image

: When using CloudFlare your domain would resolve to 2 CloudFlare IPs. CloudFlare is a reverse proxy, so once you point to our DNS - the domain will resolve to our IP addresses, and then within

Login to post a comment!

2 Comments

Back to top | Use Dark Theme