Mobile app version of vmapp.org
Login or Join
Ravi8258870

: Preventing botvisits to website Every time some user shares my website's address inside his/her tweets, the following bots come to my website: UnwindFetchor/1.0 (+http://www.gnip.com/) ShowyouBot

@Ravi8258870

Posted in: #Botattack #Seo #Twitter #WebCrawlers

Every time some user shares my website's address inside his/her tweets, the following bots come to my website:


UnwindFetchor/1.0 (+http://www.gnip.com/)
ShowyouBot (http://showyou.com/crawler)
JS-Kit URL Resolver, js-kit.com/ bitlybot
EventMachine
HttpClient etaURI API/2.0 +metauri.com


Ten times in a minute one of these bots come to my site and fetches my content. My question is, would banning these bots' IPs with htaccess or preventing their visits with robots.txt can harm my SEO ? Or can it obstruct some basic Twitter functionality ? For example when user shares my URL, the URL couldn't be shortened so he can't share it. Or Twitter will find my site suspicious etc?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Ravi8258870

1 Comments

Sorted by latest first Latest Oldest Best

 

@Shanna517

Modern inbound marketing does not just rely on getting indexed by Google's spiders, or even just Google and Bing/Yahoo. As SEO and SMM become more and more intertwined, more and more social media and social sharing services come into play. As such, you'll see crawlers that aren't just search spiders.

When you post a link on Twitter and it gets shortened by bit.ly, the page gets crawled by:


Twitterbot
Butterfly (http://labs.topsy.com/butterfly/)
Showyoubot (http://showyou.com/crawler)
UnwindFetchor (http://www.gnip.com/)
EventMachine HttpClient (no link)
TweetmemeBot (http://tweetmeme.com/)
JS-Kit URL Resolver (http://js-kit.com/)
PercolateCrawler (ops@percolate.com)
FlipboardProxy (http://flipboard.com/browserproxy)
Yahoo! Slurp (http://help.yahoo.com/help/us/ysearch/slurp)
PaperLiBot (http://support.paper.li/entries/20023257-what-is-paper-li)
Kimengi (nineconnections.com)


What generally happens is that:


The main social media site (Twitter, Facebook, Reddit, Digg, etc.) will crawl the page to pull the page title/heading, the meta description, and in some cases the meta keywords in order to auto-fill certain information for the user: such as the link text, the link description, relevant tags, thumbnail image, author, etc.
Secondly, as the link gets shared, search engines and other services using the Twitter API or equivalent find out about it, and they too want to add it to their index/database. If it's a search engine, it'll directly improve your search ranking/exposure. If it's another social media site, it'll increase your non-search-engine-related organic traffic.

Regardless, they need to crawl the page for roughly the same info in order to categorize/process the content. Sometimes the content is analyzed to track trending topics or provide social media analytics. For Flipboard and some enterprise social media management platforms, it's to re-format the content so it can be presented using an alternative interface (e.g. Flipboard's tablet/mobile app, or a 3rd-party social media dashboard). Similarly, some of these bots are using the social sharing APIs to allow syndicating your content.

In any case, this is all mostly good for your site as it'll increase your exposure and facilitate conversation.


Under normal circumstances, a web server should have no problem dealing with these bot requests, and you'll receive many times more organic traffic for them. However, if you're really running an overstressed server, and there aren't any more effective optimizations you can make (query caching, full-page caching, bytecode caching, browser caching, load balancing, using a CDN or light httpd to serve static content, optimize your database queries and structure, etc.) then there are a few bots that you can probably block without any harm done.

Most legitimate bots have a URL associated with their UA string. This link should tell you who runs the bot and for what purpose. If the bot is absolutely not directly or indirectly contributing any traffic/exposure to your site, then you can feel free to block them. For instance, if you have very few corporate followers, then you can probably block certain enterprise social media dashboards and social analytics apps. It won't hurt you if Sony or GM doesn't know your sentiments towards their brand or new product. Likewise, a few of these bots are actually for services that are being shutdown or have already been shutdown (like TweetMeme).

But if you're using something like Percolate to manage your social media identities and monitor your social media analytics, then you obviously don't want to block their bot, or their service will not function properly for you.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme