: What is the best way to exclude bots from view counts? My website is counting visitor views on certain pages. I noticed that Google and other Bots are "clicking" to my site like crazy and
My website is counting visitor views on certain pages. I noticed that Google and other Bots are "clicking" to my site like crazy and some of the pages get unrealistic view counts (compared to those produced by humans).
I am asking for best practice to exclude those bots from my view counts. Obvious a simple "user agent" contains "bot" won't do it.
I do not think there is a bullet prove solution nor I need one.
Note: I am using PHP + MySQL.
More posts by @Steve110
6 Comments
Sorted by latest first Latest Oldest Best
You can use an image as a counter, in this case it won't count bots
and page name is passed as a query with image name
I'm using this at img.php which update page view in database:
<?php
$xnt = $_GET["ID"]; if (isset($xnt) && is_numeric($xnt)) {
$DBServer = "localhost"; $DBUser = "xxx"; $DBPass = "xxx"; $DBaze = "xxx";
$conn = mysqli_connect($DBServer, $DBUser, $DBPass, $DBaze);
mysqli_query($conn, "UPDATE stats SET stats_vz=stats_vz+1 WHERE stats_id=".$xnt);
mysqli_close($conn);}
$im = @imagecreatetruecolor (1, 1); imagesavealpha($im, true); imagealphablending($im, false); $white = imagecolorallocatealpha($im, 255, 255, 255, 127); imagefill($im, 0, 0, $white);
header("Content-type: image/png"); imagepng($im); imagedestroy($im);
My approach involves two passes:
Filter only web browsers and consoles by matching the start of the user agent string with Mozilla|Opera|PSP|Bunjalloo|wii. Thanks to the user agent spoofing this check will detect almost all browsers
Exclude bots by common stop strings bot|crawl|slurp|spider
So if the first step is passed we assume that it is a browser and there is a real visitor behind it. As I found out though some bots pretend to be Mozilla compatible and start their user agent string with it. That's why the second pass
might come in handy and eliminate them.
function isBrowser () {
return preg_match( '/^(Mozilla|Opera|PSP|Bunjalloo|wii)/i', $_SERVER['HTTP_USER_AGENT'] ) && !preg_match( '/bot|crawl|slurp|spider/i', $_SERVER['HTTP_USER_AGENT'] );
}
I use just simple user agent parsing exclusion.
It gets rid of 99% of bots going into my pages.
SELECT * FROM `live_visitors` where (
lower(agent) != '%bot%' and
lower(agent) != '%slurp%' and
lower(agent) != '%spider%' and
lower(agent) != '%crawl%' and
lower(agent) != '%archiver%' and
lower(agent) != '%facebook%')
I'm glad you know there isn't going to be a bulletproof way to accomplish this. That means your outlook is at least realistic.
Since JavaScript is not an option I would say you're left with:
Check the user-agent for the word "bot" in it. That will catch most of them.
Compile a list of known bots and filter them based on some kind of unique identifier, probably their user-agent.
Put a hidden link in the footer of your website that links to a page that collects user-agents and/or IP addresses. Users won't see this but bots will. So anyone who visits that page will be a bot. Record them and then block them from your stats.
There are three fairly simple ways:
Use Google Analytics, which will process and handle all the data for you, and present you with detailed statistics for visitors and how they got to your site. This is by far the easiest solution.
Use Javascript to do the counting. When the page has loaded, generate an AJAX request to your counting script. Robots and spiders don't run Javascript.
Detecting "bot" in the user agent string is actually fairly reliable. Alternatively, you could stick to known bots only such as Googlebot, Yahoo, MSNbot etc. Checking those three should cover 99% of your bot traffic. This page has some others but it looks quite out of date.
UPDATE: Googlebot and some major bots do run JavaScript these days. So using option #2 alone is no longer viable. However, this does mean using it in conjunction with #3 should be quite reliable, as you can easily exclude most bots by using JS, then on the server-side exclude major bots like Googlebot that do run JS.
Also as mentioned in the comments you could try using the Google Analytics API to display views for each page.
If you use Javascript to count views then most bots won't run it and so won't be included in your view counts. This answer may be close to what you want stackoverflow.com/questions/1973448/how-can-i-count-a-page-views
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.