Mobile app version of vmapp.org
Login or Join
Kimberly868

: How to exclude Bots from Google Analytics (sudden rise of them in the last weeks) Since a few weeks, one of my Google Analytics Accounts shows a continued rise of Visitors and Pageimpressions

@Kimberly868

Posted in: #Analytics #Google #GoogleAnalytics #GoogleAnalyticsSpam #WebCrawlers

Since a few weeks, one of my Google Analytics Accounts shows a continued rise of Visitors and Pageimpressions what seams to be bots. They have a Average Session Duration of «<00:00:01» and by now represent about 42% of all visitors shown by Google Analytics. They wreck my Stats! :-/

I found out the following pattern: They hail from USA (mostly, about 80%), but also to a small part from Nigeria and China, France and Thailand. Interestingly they use all «Macintosh» with «Firefox» in Version «41.0» and almost all use a Browsersize of «1420x940». They're Language-Setting is set to «en-us» and they have no Java-Support. They visit the site «direct» and their Host- and Internet-Provider vary.

How can I exclude those numbers from my Google Analytics? I've used GA for years now, but never had any real Problems with Bots like I do now…

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Kimberly868

1 Comments

Sorted by latest first Latest Oldest Best

 

@Shelton105

It look like a spam to me, sometimes your site get targeted by spam bots, you would want to fix that, but the solution wouldn't be to exclude them from GA but to exclude them from your website.

This might be the least you can do about it :

How to track user-agents who visit your website

With a simple linux command you can track all user-agents that crawl your website.

$cat test.log | awk -F" '{print }' | sort | uniq -c | sort -n

The results would look like this :

51916 MetaURI API/2.0 +metauri.com
59899 Twitterbot/1.0
87819 Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
111261 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
187812 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)
189834 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
390477 facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)


The first number (bolded) is the amount of times this spider/crawler/user agent/ has accessed your site. Beware, these are not all crawlers, as the data is intermixed with actual human user traffic and other useful traffic..
In my example above, you see that the “Facebookexternalhit” user agent accessed the site 390,477 times per month. That is roughly 541x per hour. Excessive. On the kill list, you go!
Other heavy ones are FlipboardProxy, Twitterbot, Spaidu and Metauri. Those are part “crawler”, part “services”. Whatever they are, their usefulness does not justifying the amount of traffic/load on my server, so.. on to some more killing!

Use .htaccess to redirect bad bots

Pick the bad bots you want to "ban" and add them to a list like that one:
#redirect bad bots to one page
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Twitterbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mediawords [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlipboardProxy [NC]
RewriteCond %{REQUEST_URI} !/nocrawler.htm
RewriteRule .* yoursite/nocrawler.htm [L]

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>


Or use robots.txt

User-agent: BadBot
Disallow: /

Usefull link
www.user-agents.org/ - where R goes for robots. www.robotstxt.org/db.html - Db of all robots with advanced info for them including their user-agents.
Note that this isn't even a complete list ^^.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme