: Evidence for automatic browsing - Log file analysis I'm analyzing web server logs both in Apache and IIS log formats. I want to find the evidence for automatic browsing, like web robots, spiders,
I'm analyzing web server logs both in Apache and IIS log formats. I want to find the evidence for automatic browsing, like web robots, spiders, bots, etc. I used python robot-detection 0.2.8 for detecting robots in my log files, but I know there may be other robots (automatic programs) which have traversed through the web site but robot-detection can not identify.
So I want to ask:
Are there any specific clues that can be found in log files that human users do not leave but automated software would?
Do they follow a specific navigation pattern?
I saw some requests for favicon.ico - does this implicate that it is a automatic browsing?.
I found this article and this question with some valuable points.
More posts by @Jamie184
2 Comments
Sorted by latest first Latest Oldest Best
Each bot is different. Some bots may only look at your home page and take a screenshot of it. Others may try to download the whole site. There is no way to tell for sure if an access was from actual person. There are some common bot behaviors that can be identified:
A User-Agent string that identifies the robot. Most bots tell you they are bots.
Fetching the robots.txt file. Very few real users look at that file.
Excessive downloads. Most users view only a few pages and their related documents. A bot may try to grab your entire site.
Grabbing just HTML documents but not images, JavaScript, and CSS. Users need the supporting files to see your site. Ignoring those is a sign that it isn't a user. When the favicon.ico is fetched, it is more likely to be a real human. Many bots don't need that file while browsers are often set to fetch it to show in the URL bar.
Downloading too fast. Requesting tens of pages in a few seconds isn't something that a user is likely to do.
Downloading at regular intervals. Bots often space out their downloads at regular intervals (every second, for example)
Fetching hidden pages that are not visible to users. It is possible to set up a "bot trap" by making a link on your site that is not normally visible to users (hidden with CSS, for example). Anything that visits that page is likely to be a bot.
Not from your log files. However, there are some options that you can use to detect bots.
Most bots do not execute javascript. When the page is accessed, on the server create a cookie with a unique id and store log data under this id in a database. On the page, add javascript to submit this id to a server side program which then marks in the database that this user is probably not a bot.
The false positives are paranoid or using extremely old browsers.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.