Mobile app version of vmapp.org
Login or Join
Kristi941

: User-agent identification and SEO crawler database I am requested to analyze the traffic log of a site. In particular I have to identify the crawlers starting from the collected user agent values.

@Kristi941

Posted in: #Seo #WebCrawlers

I am requested to analyze the traffic log of a site. In particular I have to identify the crawlers starting from the collected user agent values.

I know there are 'trap' links that you can use to distinguish the crawlers from the human beings. Now I would only analyze the user agent values.

Now the question. Is there a public catalogue or a library of web crawlers?

Edit

Here is the second question. There are also a lot of empty user-agent in my traffic records. Is an empty user-agent header related to a crawler or to an authomatic process?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Kristi941

3 Comments

Sorted by latest first Latest Oldest Best

 

@Ogunnowo487

This week our company (Incapsula) launched Botopedia.org - a Community-Sourced bot directory. It's 100% free and open for all and you can use it to find a complete user-agent list for all bots you`ll want to look up.

As for indentification methods, I want to refer you to this discussion in Security.Stackexchange which covers different methods of bot identification (i.e. JS challenge, Method check, robot.txt access and more).

10% popularity Vote Up Vote Down


 

@Cody1181609

It's highly unlikely you're going to find some completely universal list of UserAgents, in part because they can just be made up. Before even getting to that, though, it'd be a ridiculous amount of work. You just need to compile a few resources and then do some further searching for anything else you don't recognize. (Surprisingly, I can't find a Wikipedia "List of…" article for this.)


Here's a massive list of nothing but iOS UA strings. If you look at how fast some of those get changed in the date column and take into account the last update to the document was 10 weeks ago, it's quite possibly already missing something.
UserAgentString.com seems more recently-maintained than user-agents.org. Every one of those product names leads to a separate page with its own sometimes-huge list.
user-agent-string.info has a lot of non-browser ones that appear to be missing from the previous, so might also be good to have around.

10% popularity Vote Up Vote Down


 

@Eichhorn148

The first link on a Google search is probably what you are looking for - www.user-agents.org/

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme