Mobile app version of vmapp.org
Login or Join
Jamie184

: Website is being targeted by mail bots have a small website. When I perform a netstat is shows a lot of traffic from .p.mail. I think this is some kind of mail bot, trying to harvest email

@Jamie184

Posted in: #SearchEngines #Security #WebDevelopment

have a small website. When I perform a netstat is shows a lot of traffic from .p.mail.

I think this is some kind of mail bot, trying to harvest email addresses from my website. How can I prevent this?

tcp 0 64 128.199.152.125:ssh 254.96.96.58.stat:49174 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-7.p.mail:52455 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http crawl-66-249-71-7:39927 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9-5.p.mail:48034 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-6.p.mail:38781 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-3.p.mail:49137 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9.mail.ru:46906 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-3.p.mail:49102 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9-4.p.mail:60833 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-1.p.mail:58404 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-3.p.mail:38515 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http crawl-66-249-71-9:65419 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-4.p.mail:39761 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9-3.p.mail:46664 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-5.p.mail:57961 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-2.p.mail:58029 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-6.p.mail:53075 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9.mail.ru:47363 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-4.p.mail:52394 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9.mail.ru:54476 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9.mail.ru:36110 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-2.p.mail:55155 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-7.p.mail:59306 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-2.p.mail:36667 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9-5.p.mail:51968 ESTABLISHED
tcp6 0 0 128.199.152.125:http fetcher9-4.p.mail:41478 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-5.p.mail:60032 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-2.p.mail:44335 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-6.p.mail:57922 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-1.p.mail:59718 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-3.p.mail:47470 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9-6.p.mail:59941 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-1.p.mail:54604 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9.mail.ru:48307 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-6.p.mail:47410 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-2.p.mail:52740 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9.mail.ru:48957 ESTABLISHED
tcp6 0 0 128.199.152.125:http fetcher9-6.p.mail:55988 ESTABLISHED
tcp6 0 0 128.199.152.125:http fetcher9-6.p.mail:45431 ESTABLISHED
tcp6 0 0 128.199.152.125:http crawl-66-249-71-1:54299 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-1.p.mail:44075 CLOSE_WAIT
tcp6 0 0 128.199.152.125:http fetcher9-7.p.mail:51332 ESTABLISHED
tcp6 1 0 128.199.152.125:http fetcher9-6.p.mail:40081 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-2.p.mail:47806 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-5.p.mail:40396 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http baiduspider-180-7:53078 CLOSE_WAIT
tcp6 1 0 128.199.152.125:http fetcher9-1.p.mail:46357 CLOSE_WAIT

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Jamie184

2 Comments

Sorted by latest first Latest Oldest Best

 

@Gloria169

I ended up using iptables to block the ips.

From the root linux shell I typed

iptables -A INPUT -s 217.69.133.13 -j DROP
iptables -A INPUT -s 217.69.133.12 -j DROP
iptables -A INPUT -s 217.69.133.10 -j DROP
iptables -A INPUT -s 217.69.133.11 -j DROP
iptables -A INPUT -s 217.69.133.14 -j DROP
iptables -A INPUT -s 217.69.133.15 -j DROP
iptables -A INPUT -s 217.69.133.16 -j DROP
iptables -A INPUT -s 217.69.133.17 -j DROP
iptables -A INPUT -s 217.69.133.18 -j DROP
iptables -A INPUT -s fetcher9.mail.ru -j DROP


And it stopped

10% popularity Vote Up Vote Down


 

@Jamie184

You are okay. Some may argue, but this is not a bad bot. I research these things as part of my research and while I do have a lot of activity from mail.ru, I do not have any bad bot activity doing a quick search.

Mail.ru operates a search engine.

This is a web crawler for their search engine. The page referred to in the agent string is go.mail.ru/help/robots which you will have to get your browser to translate. Here is what it says:


Crawler or spider (spider, crawler, bot) - a program that "walks" on
the internet urlam and then downloads them for subsequent indexing.
After downloading the document robot analyzes it, determines the type,
encoding, language and adds links from the page in place for further
obkachki. Periodically, the robot returns to the already previously
visited page to check their relevance.

Besides the main robot, which indexes the entire Internet as a whole,
there are specialized to separate download images, videos, news, rss,
etc. Thus, they can accelerate the penetration of a certain type of
documents in the search index.


You can block their user agent. From their page:


Directive is used to disable the robot download parts of the site or
the entire site. The value in this line is a partial URL. Examples:


User-agent: Mail.RU_Bot
Disallow: / # is blocking access to the entire site

User-agent: Mail.Ru
Disallow: / search # blocks access to pages starting with '/ search', /search.html, / search / something, etc.
# To access other sections of the site open
User-agent: *
Disallow: # allowed access to the entire site,
# Is equivalent to the absence of robots.txt


You should be able to block the robot from your entire site using:

User-agent: Mail.Ru
Disallow: /


This should be okay to block if you do not want Russian traffic. Otherwise, mail.ru is said to follow robots.txt and I do not see any problems after doing a quick check of my database. It seems to be well behaved.

It will likely take a couple of days for mail.ru to read the robots.txt file and take notice of the change.

Push comes to shove, you can always use .htaccess (assuming Apache) to block access.

RewriteCond %{REMOTE_HOST} ^.*.mail.ru$ [NC,OR]
RewriteCond %{REMOTE_ADDR} ^5.61.(2*3*[2-9]*).([0-2]*[0-5]*[0-5]*)$ [NC]
RewriteRule .* - [F,L]


But I would not do this too soon. It is likely not necessary.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme