: Do I really have to block MJ12Bot (as the prevailing visitor on my site)? I am all for allowing any legitimate search engines to visit my site, but I've noticed that on my business-card-style
I am all for allowing any legitimate search engines to visit my site, but I've noticed that on my business-card-style website about every other request comes from MJ12Bot, yet for well-known reasons of them being a niche SEO bot, they don't even actually send any human visitors back, so, I'm quite disappointed about the noise they generate.
% cut -f12- -d" " constantine.su.access.log | sort | uniq -c | fgrep -i -e bot -e spider | sort -nr | head
421 "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; www.majestic12.co.uk/bot.php?+)
69 "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
64 "woobot/1.1"
62 "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
61 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
39 "Mozilla/5.0 (compatible; SeznamBot/3.2; +http://napoveda.seznam.cz/en/seznambot-intro/)"
30 "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
14 "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
13 "woobot/2.0"
12 "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
Is there a way to quiet down MJ12Bot ambitions (by something like 20×)? Or, due to the distributed nature of the MJ12bot project, do I just have to block 'em all outright as parasitic?
More posts by @Murray432
3 Comments
Sorted by latest first Latest Oldest Best
MJ12bot adheres to the robots.txt standard. If you want the bot to prevent website from being crawled then add the following text to your robots.txt:
User-agent: MJ12bot
Disallow: /
From your comments on another answer, MJ12Bot is visiting your site less than once an hour (421 times in 25 days.) The best thing to do is to not worry about it. Crawl-Delay is useless for you because no crawler will obey a craw-delay so large.
Is there a way to quiet down MJ12Bot ambitions
The MJ12Bot reportedly obeys robots.txt and the (non-standard) Crawl-Delay directive:
How can I slow down MJ12bot?
You can easily slow down bot by adding the following to your robots.txt file:
User-Agent: MJ12bot
Crawl-Delay: 5
Crawl-Delay should be an integer number and it signifies number of seconds of wait between requests. MJ12bot will make an up to 20 seconds delay between requests to your site - note however that while it is unlikely, it is still possible your site may have been crawled from multiple MJ12bots at the same time. Making high Crawl-Delay should minimise impact on your site.
Reference: mj12bot.com/
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.