: Question about duplicate groups (agents) in robots.txt I'm looking at a corner case with robots.txt, and am curious if you have any guidance. Basically, I'm in a situation where multiple people

I'm looking at a corner case with robots.txt, and am curious if you have any guidance.

Basically, I'm in a situation where multiple people will be making edits to the same robots.txt file.

I've pointed to the Google Resource as our standard for how it should look, but we're running into a question regarding duplicate groups for the same user agent.

Basically, what happens if you have a robots.txt file structured like this:

User-agent: *
Disallow: *.asd
Disallow: *.exe

User-agent: *
Disallow: /app/
Disallow: /api/

Would all crawlers interpret this as:

User-agent: *
Disallow: *.asd
Disallow: *.exe
Disallow: /app/
Disallow: /api/

Or would they pick one over the other? Google says: "Only one group of records is valid for a particular crawler.", so I'd interpret this as a crawler is going to pick either group, but not both...

But have no direct experience with duplicate groups/specifying the same user agent more than once in robots.txt.

10.02% popularity Vote Up Vote Down

: Jumbled Up Text - how does it impact SEO? We have a website that contains company profile pages. Each page has information on a company, the information is all displayed as text to users

@Correia994

Posted in: #Seo

1 Comments

: Updating Old Site To New. Which Content Management System Lets Me Retain My Old URL? My old site is done in HTML with .html as file extension. To allow all those users who visit sites by

@Correia994

Posted in: #Cms #Design #Url #WebDevelopment

1 Comments

: Duplicate meta description even though link is removed? I thought this was just webmaster taking it time to catch up but it has been there for weeks now and webmaster crawled as recently as

@Correia994

Posted in: #DuplicateContent #GoogleSearchConsole #MetaDescription

1 Comments

: Debugging reason for specific site sections with significantly low appearance in SERP I'm experiencing very little exposure for specific sections in my site compared to different sections. Most

@Correia994

Posted in: #Indexing #Seo #Serps

0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Shanna517

In the draft A Method for Web Robots Control from 1996 it says:

The robot must obey the first record in /robots.txt that contains a User-Agent line whose value contains the name token of the robot as a substring.
[…]
If no such record exists, it should obey the first record with a User-agent line with a "*" value, if present.

(Note that this is not the original robots.txt documentation, which would be www.robotstxt.org/orig.html from 1994, but it doesn't include info about this part.)

As you noted, also the Google documentation says:

Only one group of records is valid for a particular crawler.

So I'd say, strictly speaking only one block has to be taken into account. So yes, following blocks could be ignored.

Of course it might be true that some robots.txt parsers (including Google's) also add following record groups into account, but I think you shouldn't rely on it.

10% popularity Vote Up Vote Down

@Frith620

Googlebot interprets it as one block. Meaning they will not crawl URLs that match any of the four rules.

This is super easy to find out since Google gives you a robot.txt testing tool once you are registered with Google Webmaster Tools. This is extremely useful for manyreasons, so I strongly recommend you do so. It would have at least avoided you waiting for this answer :)

10% popularity Vote Up Vote Down

Feed

: Question about duplicate groups (agents) in robots.txt I'm looking at a corner case with robots.txt, and am curious if you have any guidance. Basically, I'm in a situation where multiple people

More posts by @Correia994

: Jumbled Up Text - how does it impact SEO? We have a website that contains company profile pages. Each page has information on a company, the information is all displayed as text to users

: Updating Old Site To New. Which Content Management System Lets Me Retain My Old URL? My old site is done in HTML with .html as file extension. To allow all those users who visit sites by

: Duplicate meta description even though link is removed? I thought this was just webmaster taking it time to catch up but it has been there for weeks now and webmaster crawled as recently as

: Debugging reason for specific site sections with significantly low appearance in SERP I'm experiencing very little exposure for specific sections in my site compared to different sections. Most

Login to post a comment!

2 Comments

Back to top | Use Dark Theme