Mobile app version of vmapp.org
Login or Join
Ravi8258870

: Preventing Twitterbot to access website I have a Twitter application so users of my application shares links of my webpage inside their tweets. It seems like bots follow these links and some

@Ravi8258870

Posted in: #RobotsTxt #Twitter #WebCrawlers

I have a Twitter application so users of my application shares links of my webpage inside their tweets. It seems like bots follow these links and some of these bots create high bandwidth usage. And most of them doesn't provide me any hit. So I want to disallow them with robots.txt, or .htaccess file.

When I check access.log I see following bots below.
My conecern is will it be a problem to ban Twitterbot ? Who owns this bot ? Twitter.com or other website? What would be the drawbacks to disallow it ?

No Bot name Daily hits
1 Twitterbot 1,499
2 MJ12bot 1,490
3 Google AdSense Robot 774
4 ShowyouBot (http://showyou.com/crawler) 655
5 Googlebot 595
6 Bing Robot 204
7 Yandex Robot 186
8 Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php) 148
9 Apple RSS Robot 126
10 Mozilla/5.0 (compatible; GrapeshotCrawler/2.0; +http://www.grapeshot.co.uk/crawler.php) 76
11 FaceBook Crawler 62
12 Alexa Robot 48
13 QuerySeekerSpider ( queryseeker.com/bot.html ) 37
14 Google Feedfetcher 28
15 Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDR; .NET4.0C; .NET4.0E; .NET CLR 1.1.4322; Tablet PC 2.0); 360Spider 17
16 Ezooms Robot 14
17 AhrefsBot 10
18 Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; ) Firefox/1.5.0.11; 360Spider 9
19 Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html) 8
20 Baidu Spider 7
21 Yetibot 3
22 Exabot 2
23 FeedBot 2
24 Mozilla/5.0 (compatible; SISTRIX Crawler; crawler.sistrix.net/) 2
25 SeznamBot 2
26 Yahoo! Slurp

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Ravi8258870

1 Comments

Sorted by latest first Latest Oldest Best

 

@BetL925

Twitterbot is owned by Twitter. Basically it comes along to index the content of any given URL (like Google does). I'm not sure, but I think they mostly use this data for the snippets (Twitter Cards) shown that go along with a Tweet with a link ie. the page title, the description, and an image (if present).

So the only downside I see to blocking it would be that links to your site wouldn't have Twitter Cards associated with them to other users. This could of course result in a lower click-through rate for links to your website, however.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme