: Block Yandex crawler Our site has been behaving very strangely for the last few days, lots of time outs etc. Finally think I found the cause, the Yandex bot is crawling around 10,000 pages
Our site has been behaving very strangely for the last few days, lots of time outs etc. Finally think I found the cause, the Yandex bot is crawling around 10,000 pages an hour! I need to stop it ASAP, I think that's creating around 50-100gb of bandwidth usage per day.
Blocked IP's (via myip.ms/info/bots/Google_Bing_Yahoo_Facebook_etc_Bot_IP_Addresses.html):
100.43.90.0/24, 37.9.115.0/24, 37.140.165.0/24, 77.88.22.0/25, 77.88.29.0/24, 77.88.31.0/24, 77.88.59.0/24, 84.201.146.0/24, 84.201.148.0/24, 84.201.149.0/24, 87.250.243.0/24, 87.250.253.0/24, 93.158.147.0/24, 93.158.148.0/24, 93.158.151.0/24, 93.158.153.0/32, 95.108.128.0/24, 95.108.138.0/24, 95.108.150.0/23, 95.108.158.0/24, 95.108.156.0/24, 95.108.188.128/25, 95.108.234.0/24, 95.108.248.0/24, 100.43.80.0/24, 130.193.62.0/24, 141.8.153.0/24, 178.154.165.0/24, 178.154.166.128/25, 178.154.173.29, 178.154.200.158, 178.154.202.0/24, 178.154.205.0/24, 178.154.239.0/24, 178.154.243.0/24, 37.9.84.253, 199.21.99.99, 178.154.162.29, 178.154.203.251, 178.154.211.250, 95.108.246.252, 5.45.254.0/24, 5.255.253.0/24, 37.140.141.0/24, 37.140.188.0/24, 100.43.81.0/24, 100.43.85.0/24, 100.43.91.0/24, 199.21.99.0/24
My robots.txt:
User-agent: Yandex
Disallow: /
User-agent: *
Disallow: ... etc
But it's still apparently crawling as reported by Cloudflare.
What else can I do to stop it?
More posts by @XinRu657
1 Comments
Sorted by latest first Latest Oldest Best
Right from Yandex website
User-Agent Mozilla/5.0 (compatible; Yandex...) string identifies Yandex robots. Robots
can send GET (for example, YandexBot/3.0) and HEAD (YandexWebmaster/2.0) requests to a
server. A reverse DNS lookup can be used to check the authenticity of Yandex robots. More
information can be found in the How to check that a robot belongs to Yandex section of
the Webmaster help.
If you have any questions about our robots, please contact our support service:
support@search.yandex.com. If you are experiencing technical issues with our robots
we recommend attaching your server log.
You can email their team and request they don't crawl your server or block the correct user-agent. If your server is overloaded and cannot keep up with the robot download requests, you should use the Crawl-delay directive. It will allow you to specify the minimum amount of time (in seconds) between the search robot downloading one page and starting the next.
Examples:
User-agent: Yandex
Crawl-delay: 2 # specifies a 2 second timeout
and
User-agent: *
Disallow: /search
Crawl-delay: 4.5 # specifies a 4.5 second timeout
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.