: Preventing high bandwidth usage from Yandex Although I put a row for Yandex into my robots.txt file, sometimes Yandex indexes my website aggressively. So I hard coded a part and check for user
Although I put a row for Yandex into my robots.txt file, sometimes Yandex indexes my website aggressively. So I hard coded a part and check for user agent, and serve cached file if user agent is like this: "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
But when I check statcounter logs I recently saw that other Yandex related bots frequently crawl my site. They are similar to following. I took this info from my cPanel log:
Beeline (128.69.243.12)
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.2)
Referer: yandex.ru/yandsearch?text=example.com&lr=213
Beeline (89.178.108.247)
Referer: yandex.ru/yandsearch?text=example.com&lr=213 Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2)
How can I block or serve cached pages these bots?
When I check for $_SERVER['HTTP_USER_AGENT'] I can't see "yandex.ru" in referrer. Referrer comes empty. Is it possible to find out referrer in cPanel log, but can't take it from HTTP_USER_AGENT ??
And I also I don't want to ban IPs because there exist too many IPs related with this issue and they are changed periodically. So how can I find out this bot?
Does anybody has similar issue?
Thank you
More posts by @Cooney921
1 Comments
Sorted by latest first Latest Oldest Best
Use a robots.txt crawl delay as described in help.yandex.com/search/?id=1112639
Example:
User-agent: Yandex
Crawl-delay: 2 # specifies a 2 second timeout
Before you start banning this bot, you should first verify that your logs are actually Yandex and not someone else who is spoofing the user agent to look like they are yandex. A tactic used by competitors to hussle you into blocking or delaying a bot so they can out rank you. Perform a DNS lookup : help.yandex.com/search/?id=1112029
You can serve a cache copy dependant on user agent in a number of ways. If you use apache you can do it with mod_rewrite rules. If you use PHP you can do it by sniffing the $_SERVER['HTTP_USER_AGENT'] variable or even use the function get_browser(). How you build a cache is also varied and can be done in 101 ways. Honestly though for the best performance you should always be using caching.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.