Mobile app version of vmapp.org
Login or Join
Martha676

: Clicks counting and crawler bots I am currently running a small affiliate-program for Facebook users. We use an auto-poster to publish links to fan pages. Every hit is stored in our database

@Martha676

Posted in: #GoogleAdsense #Php #RobotsTxt

I am currently running a small affiliate-program for Facebook users. We use an auto-poster to publish links to fan pages. Every hit is stored in our database and we have included a 24 hour reload block for the IP-addresses. My problem right now is that the PHP script also stores every hit from all the bots that crawls my website. Now I was thinking to block those bots with the robots.txt of my website but I am afraid that this will have a negative effect on my AdSense ads.
Does anybody have an idea for me how to work this out?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Martha676

1 Comments

Sorted by latest first Latest Oldest Best

 

@Martha676

Avoid using the robots.txt to block known robots. You want robots to crawl your site, you only need to distinguish which hits must be logged and which not.

To solve this problem, you could check the client's user agent and store only the hits that meet your criteria. For example you could use a simple function, to check if the hit comes from a known search engine (You can also add more keys):

function client_is_crawler($user_agent)
{
//Set the crawlers list
$crawlers = array(
'google' => 'GoogleBot|Google Web Preview|Mediapartners-Google|Wirelesss*Transcoder',
'alexa' => 'ia_archiver',
'yahoo' => 'compatible; Yahoo! Slurp;',
'msn' => 'msnbot',
'bing' => 'bingbot',
'apache_bench' => 'ApacheBench',
'baiduspider' => 'Baiduspider',
'grapeshot' => 'GrapeshotCrawler',
'archive.org' => 'archive.org_bot',
'spider' => 'spider',
'indexer' => 'indexer',
'admantx' => 'admantx.com',
'robot' => 'robot',
'bot' => 'bot',
'search' => 'search',
'genieo' => 'Genieo'
);

//Loop through crawlers list, and check if client is a known crawler or not
foreach ($crawlers AS $key => $crawler)
{
if (preg_match('/b' . $crawler . 'b/i', $user_agent) > 0) //It is
return $key;
}

return false; //No is not
}


And call it with the client's user_agent, retrieved from the $_SERVER superglobal array:

if (client_is_crawler($_SERVER['HTTP_USER_AGENT']) === false) //Client is not a bot
do_store_hit();

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme