Mobile app version of vmapp.org
Login or Join
Correia994

: Question about duplicate groups (agents) in robots.txt I'm looking at a corner case with robots.txt, and am curious if you have any guidance. Basically, I'm in a situation where multiple people

@Correia994

Posted in: #RobotsTxt #UserAgent

I'm looking at a corner case with robots.txt, and am curious if you have any guidance.

Basically, I'm in a situation where multiple people will be making edits to the same robots.txt file.

I've pointed to the Google Resource as our standard for how it should look, but we're running into a question regarding duplicate groups for the same user agent.

Basically, what happens if you have a robots.txt file structured like this:

User-agent: *
Disallow: *.asd
Disallow: *.exe

User-agent: *
Disallow: /app/
Disallow: /api/


Would all crawlers interpret this as:

User-agent: *
Disallow: *.asd
Disallow: *.exe
Disallow: /app/
Disallow: /api/


Or would they pick one over the other? Google says: "Only one group of records is valid for a particular crawler.", so I'd interpret this as a crawler is going to pick either group, but not both...

But have no direct experience with duplicate groups/specifying the same user agent more than once in robots.txt.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Correia994

2 Comments

Sorted by latest first Latest Oldest Best

 

@Shanna517

In the draft A Method for Web Robots Control from 1996 it says:


The robot must obey the first record in /robots.txt that contains a User-Agent line whose value contains the name token of the robot as a substring.
[…]
If no such record exists, it should obey the first record with a User-agent line with a "*" value, if present.


(Note that this is not the original robots.txt documentation, which would be www.robotstxt.org/orig.html from 1994, but it doesn't include info about this part.)

As you noted, also the Google documentation says:


Only one group of records is valid for a particular crawler.


So I'd say, strictly speaking only one block has to be taken into account. So yes, following blocks could be ignored.

Of course it might be true that some robots.txt parsers (including Google's) also add following record groups into account, but I think you shouldn't rely on it.

10% popularity Vote Up Vote Down


 

@Frith620

Googlebot interprets it as one block. Meaning they will not crawl URLs that match any of the four rules.

This is super easy to find out since Google gives you a robot.txt testing tool once you are registered with Google Webmaster Tools. This is extremely useful for manyreasons, so I strongly recommend you do so. It would have at least avoided you waiting for this answer :)

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme