: User-agent identification and SEO crawler database I am requested to analyze the traffic log of a site. In particular I have to identify the crawlers starting from the collected user agent values.
I am requested to analyze the traffic log of a site. In particular I have to identify the crawlers starting from the collected user agent values.
I know there are 'trap' links that you can use to distinguish the crawlers from the human beings. Now I would only analyze the user agent values.
Now the question. Is there a public catalogue or a library of web crawlers?
Edit
Here is the second question. There are also a lot of empty user-agent in my traffic records. Is an empty user-agent header related to a crawler or to an authomatic process?
More posts by @Kristi941
3 Comments
Sorted by latest first Latest Oldest Best
This week our company (Incapsula) launched Botopedia.org - a Community-Sourced bot directory. It's 100% free and open for all and you can use it to find a complete user-agent list for all bots you`ll want to look up.
As for indentification methods, I want to refer you to this discussion in Security.Stackexchange which covers different methods of bot identification (i.e. JS challenge, Method check, robot.txt access and more).
It's highly unlikely you're going to find some completely universal list of UserAgents, in part because they can just be made up. Before even getting to that, though, it'd be a ridiculous amount of work. You just need to compile a few resources and then do some further searching for anything else you don't recognize. (Surprisingly, I can't find a Wikipedia "List of…" article for this.)
Here's a massive list of nothing but iOS UA strings. If you look at how fast some of those get changed in the date column and take into account the last update to the document was 10 weeks ago, it's quite possibly already missing something.
UserAgentString.com seems more recently-maintained than user-agents.org. Every one of those product names leads to a separate page with its own sometimes-huge list.
user-agent-string.info has a lot of non-browser ones that appear to be missing from the previous, so might also be good to have around.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.