Mobile app version of vmapp.org
Login or Join
XinRu657

: Block Yandex crawler Our site has been behaving very strangely for the last few days, lots of time outs etc. Finally think I found the cause, the Yandex bot is crawling around 10,000 pages

@XinRu657

Posted in: #WebCrawlers #Yandex

Our site has been behaving very strangely for the last few days, lots of time outs etc. Finally think I found the cause, the Yandex bot is crawling around 10,000 pages an hour! I need to stop it ASAP, I think that's creating around 50-100gb of bandwidth usage per day.

Blocked IP's (via myip.ms/info/bots/Google_Bing_Yahoo_Facebook_etc_Bot_IP_Addresses.html):

100.43.90.0/24, 37.9.115.0/24, 37.140.165.0/24, 77.88.22.0/25, 77.88.29.0/24, 77.88.31.0/24, 77.88.59.0/24, 84.201.146.0/24, 84.201.148.0/24, 84.201.149.0/24, 87.250.243.0/24, 87.250.253.0/24, 93.158.147.0/24, 93.158.148.0/24, 93.158.151.0/24, 93.158.153.0/32, 95.108.128.0/24, 95.108.138.0/24, 95.108.150.0/23, 95.108.158.0/24, 95.108.156.0/24, 95.108.188.128/25, 95.108.234.0/24, 95.108.248.0/24, 100.43.80.0/24, 130.193.62.0/24, 141.8.153.0/24, 178.154.165.0/24, 178.154.166.128/25, 178.154.173.29, 178.154.200.158, 178.154.202.0/24, 178.154.205.0/24, 178.154.239.0/24, 178.154.243.0/24, 37.9.84.253, 199.21.99.99, 178.154.162.29, 178.154.203.251, 178.154.211.250, 95.108.246.252, 5.45.254.0/24, 5.255.253.0/24, 37.140.141.0/24, 37.140.188.0/24, 100.43.81.0/24, 100.43.85.0/24, 100.43.91.0/24, 199.21.99.0/24


My robots.txt:

User-agent: Yandex
Disallow: /

User-agent: *
Disallow: ... etc


But it's still apparently crawling as reported by Cloudflare.

What else can I do to stop it?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @XinRu657

1 Comments

Sorted by latest first Latest Oldest Best

 

@Harper822

Right from Yandex website

User-Agent Mozilla/5.0 (compatible; Yandex...) string identifies Yandex robots. Robots
can send GET (for example, YandexBot/3.0) and HEAD (YandexWebmaster/2.0) requests to a
server. A reverse DNS lookup can be used to check the authenticity of Yandex robots. More
information can be found in the How to check that a robot belongs to Yandex section of
the Webmaster help.

If you have any questions about our robots, please contact our support service:
support@search.yandex.com. If you are experiencing technical issues with our robots
we recommend attaching your server log.


You can email their team and request they don't crawl your server or block the correct user-agent. If your server is overloaded and cannot keep up with the robot download requests, you should use the Crawl-delay directive. It will allow you to specify the minimum amount of time (in seconds) between the search robot downloading one page and starting the next.

Examples:

User-agent: Yandex
Crawl-delay: 2 # specifies a 2 second timeout


and

User-agent: *
Disallow: /search
Crawl-delay: 4.5 # specifies a 4.5 second timeout

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme