Mobile app version of vmapp.org
Login or Join
Steve110

: What is the best way to exclude bots from view counts? My website is counting visitor views on certain pages. I noticed that Google and other Bots are "clicking" to my site like crazy and

@Steve110

Posted in: #BestPractices #Mysql #Php #WebCrawlers

My website is counting visitor views on certain pages. I noticed that Google and other Bots are "clicking" to my site like crazy and some of the pages get unrealistic view counts (compared to those produced by humans).

I am asking for best practice to exclude those bots from my view counts. Obvious a simple "user agent" contains "bot" won't do it.

I do not think there is a bullet prove solution nor I need one.

Note: I am using PHP + MySQL.

10.06% popularity Vote Up Vote Down


Login to follow query

More posts by @Steve110

6 Comments

Sorted by latest first Latest Oldest Best

 

@Barnes591

You can use an image as a counter, in this case it won't count bots
and page name is passed as a query with image name

I'm using this at img.php which update page view in database:

<?php
$xnt = $_GET["ID"]; if (isset($xnt) && is_numeric($xnt)) {
$DBServer = "localhost"; $DBUser = "xxx"; $DBPass = "xxx"; $DBaze = "xxx";
$conn = mysqli_connect($DBServer, $DBUser, $DBPass, $DBaze);
mysqli_query($conn, "UPDATE stats SET stats_vz=stats_vz+1 WHERE stats_id=".$xnt);
mysqli_close($conn);}
$im = @imagecreatetruecolor (1, 1); imagesavealpha($im, true); imagealphablending($im, false); $white = imagecolorallocatealpha($im, 255, 255, 255, 127); imagefill($im, 0, 0, $white);
header("Content-type: image/png"); imagepng($im); imagedestroy($im);

10% popularity Vote Up Vote Down


 

@Ravi8258870

My approach involves two passes:


Filter only web browsers and consoles by matching the start of the user agent string with Mozilla|Opera|PSP|Bunjalloo|wii. Thanks to the user agent spoofing this check will detect almost all browsers
Exclude bots by common stop strings bot|crawl|slurp|spider


So if the first step is passed we assume that it is a browser and there is a real visitor behind it. As I found out though some bots pretend to be Mozilla compatible and start their user agent string with it. That's why the second pass
might come in handy and eliminate them.

function isBrowser () {

return preg_match( '/^(Mozilla|Opera|PSP|Bunjalloo|wii)/i', $_SERVER['HTTP_USER_AGENT'] ) && !preg_match( '/bot|crawl|slurp|spider/i', $_SERVER['HTTP_USER_AGENT'] );
}

10% popularity Vote Up Vote Down


 

@Sent6035632

I use just simple user agent parsing exclusion.
It gets rid of 99% of bots going into my pages.

SELECT * FROM `live_visitors` where (
lower(agent) != '%bot%' and
lower(agent) != '%slurp%' and
lower(agent) != '%spider%' and
lower(agent) != '%crawl%' and
lower(agent) != '%archiver%' and
lower(agent) != '%facebook%')

10% popularity Vote Up Vote Down


 

@Sarah324

I'm glad you know there isn't going to be a bulletproof way to accomplish this. That means your outlook is at least realistic.

Since JavaScript is not an option I would say you're left with:


Check the user-agent for the word "bot" in it. That will catch most of them.
Compile a list of known bots and filter them based on some kind of unique identifier, probably their user-agent.
Put a hidden link in the footer of your website that links to a page that collects user-agents and/or IP addresses. Users won't see this but bots will. So anyone who visits that page will be a bot. Record them and then block them from your stats.

10% popularity Vote Up Vote Down


 

@Cofer257

There are three fairly simple ways:


Use Google Analytics, which will process and handle all the data for you, and present you with detailed statistics for visitors and how they got to your site. This is by far the easiest solution.
Use Javascript to do the counting. When the page has loaded, generate an AJAX request to your counting script. Robots and spiders don't run Javascript.
Detecting "bot" in the user agent string is actually fairly reliable. Alternatively, you could stick to known bots only such as Googlebot, Yahoo, MSNbot etc. Checking those three should cover 99% of your bot traffic. This page has some others but it looks quite out of date.


UPDATE: Googlebot and some major bots do run JavaScript these days. So using option #2 alone is no longer viable. However, this does mean using it in conjunction with #3 should be quite reliable, as you can easily exclude most bots by using JS, then on the server-side exclude major bots like Googlebot that do run JS.

Also as mentioned in the comments you could try using the Google Analytics API to display views for each page.

10% popularity Vote Up Vote Down


 

@Goswami781

If you use Javascript to count views then most bots won't run it and so won't be included in your view counts. This answer may be close to what you want stackoverflow.com/questions/1973448/how-can-i-count-a-page-views

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme