Mobile app version of vmapp.org
Login or Join
Jamie184

: Evidence for automatic browsing - Log file analysis I'm analyzing web server logs both in Apache and IIS log formats. I want to find the evidence for automatic browsing, like web robots, spiders,

@Jamie184

Posted in: #Http #WebHosting

I'm analyzing web server logs both in Apache and IIS log formats. I want to find the evidence for automatic browsing, like web robots, spiders, bots, etc. I used python robot-detection 0.2.8 for detecting robots in my log files, but I know there may be other robots (automatic programs) which have traversed through the web site but robot-detection can not identify.

So I want to ask:


Are there any specific clues that can be found in log files that human users do not leave but automated software would?
Do they follow a specific navigation pattern?
I saw some requests for favicon.ico - does this implicate that it is a automatic browsing?.


I found this article and this question with some valuable points.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Jamie184

2 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

Each bot is different. Some bots may only look at your home page and take a screenshot of it. Others may try to download the whole site. There is no way to tell for sure if an access was from actual person. There are some common bot behaviors that can be identified:


A User-Agent string that identifies the robot. Most bots tell you they are bots.
Fetching the robots.txt file. Very few real users look at that file.
Excessive downloads. Most users view only a few pages and their related documents. A bot may try to grab your entire site.
Grabbing just HTML documents but not images, JavaScript, and CSS. Users need the supporting files to see your site. Ignoring those is a sign that it isn't a user. When the favicon.ico is fetched, it is more likely to be a real human. Many bots don't need that file while browsers are often set to fetch it to show in the URL bar.
Downloading too fast. Requesting tens of pages in a few seconds isn't something that a user is likely to do.
Downloading at regular intervals. Bots often space out their downloads at regular intervals (every second, for example)
Fetching hidden pages that are not visible to users. It is possible to set up a "bot trap" by making a link on your site that is not normally visible to users (hidden with CSS, for example). Anything that visits that page is likely to be a bot.

10% popularity Vote Up Vote Down


 

@Angela700

Not from your log files. However, there are some options that you can use to detect bots.

Most bots do not execute javascript. When the page is accessed, on the server create a cookie with a unique id and store log data under this id in a database. On the page, add javascript to submit this id to a server side program which then marks in the database that this user is probably not a bot.

The false positives are paranoid or using extremely old browsers.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme