Mobile app version of vmapp.org
Login or Join
Hamaas447

: Which token from a long User-Agent should I use in robots.txt? The definition of User-Agent states that several tokens can be included, as deemed necessary by the client. I want to block certain

@Hamaas447

Posted in: #RobotsTxt

The definition of User-Agent states that several tokens can be included, as deemed necessary by the client.

I want to block certain bots via robots.txt and I am confused as to which part of the User-Agent string to use, especially for more obscure bots. For example:

Mozilla/5.0 (compatible; uMBot-LN/1.0; mailto: crawling@ubermetrics-technologies.com)"
JS-Kit URL Resolver, js-kit.com/ Mozilla/5.0 (compatible; SEOkicks-Robot +http://www.seokicks.de/robot.html


Do I use the second token? Can tokens contain spaces, or did the SEOkicks folks forget a semicolon after SEOkicks-Robot? I don't actually intend on making my question specific to a couple bots - I want to know the guideline: which part of UA do I place in robots.txt for these exotic bots with UA as long as a haiku?

User-agent: uMBot-LN/1.0
Disallow: /


PS: Thank you but I do not need to hear that undesirable bots are better blocked with mod_security. I already have commercial mod_sec rules in place.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Hamaas447

1 Comments

Sorted by latest first Latest Oldest Best

 

@Eichhorn148

Web crawlers that support robots.txt often publish a page about their crawler with instructions about how to block the crawler in robots.txt:


Google:


The Google user-agent is (appropriately enough) Googlebot.

Yandex:


Examples: User-agent: Yandex

Yahoo:


Yahoo Slurp obeys the first entry in the robots.txt file with a User-agent containing "Slurp."



There is also a database of robot names that can be used in robots.txt on the robots website: www.robotstxt.org/db.html
Unfortunately, neither of the robots that you post as examples have pages that I could find, nor are they listed in the robots database. However, as a pattern, I would expect that using a slash in the User-agent line of robots.txt would not be appropriate. None of the examples that I have come across recommend that. So I would use:

User-agent: uMBot-LN
Disallow: /

User-agent: SEOkicks-Robot
Disallow: /

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme