Mobile app version of vmapp.org
Login or Join
YK1175434

: How to get less visits to my website? I've read that a huge percentage of traffic comes from bots - legit bots like search engine spiders and illegitimate bots like spammers and ones looking

@YK1175434

Posted in: #Spam #Traffic

I've read that a huge percentage of traffic comes from bots - legit bots like search engine spiders and illegitimate bots like spammers and ones looking for vulnerabilities.

I'm about to launch my website. The problem is that I am about to sign a copyright license for some of the content on the website, and the contract says that I have to pay a certain sum of money depending on how many visits the site gets (per month). The more visits, the more it will cost.

So I have an incentive in limiting traffic to only legitimate hits by blocking the crap. I don't want the website to get popular, so maybe I don't want the search engines to do too much indexing either.

What should I be looking into?

10.05% popularity Vote Up Vote Down


Login to follow query

More posts by @YK1175434

5 Comments

Sorted by latest first Latest Oldest Best

 

@Jessie594

Block all the bad hosting companies. These are what run the bots and some proxies. ASN level blocks mean denying every single IP they own. Here is a tool to do that: www.enjen.net/asn-blocklist/
Now you need a list of ASN's of bad hosts. These are hosts that operate known spam networks. Here is a starting list (which doesn't include middle of the road guilty hosts like AWS or Digitalocean):


AS4134 ChinaNet
AS4837 China Unicom Backbone
AS4538 China Education and Research Network Center
AS9808 Guangdong Mobile Com
AS9394 China TieTong Telecommunications Corporation
AS49120 Gorset Ltd
AS44387 PE Radashevsky Sergiy Oleksandrovich
AS47142 PP Andrey Kiselev
AS15895 Kyivstar PJSC
AS50915 S.C. Everhost S.R.L.
AS9829 National Internet Backbone
AS17974 PT Telekomunikasi Indonesia
AS26347 Dream Network LLC
AS43350 NFOrce Entertainment BV
AS63008 Contina
AS53264 Continuum Data Centers, LLC.
AS36352 ColoCrossing
AS16276 OVH SAS
AS57858 Fiber Grid OU
AS53889 Micfo
AS62904 Eonix Corporation 1
AS30693 Eonix Corporation 2
AS55286 B2 Net Solutions Inc.
AS18978 Enzu Inc
AS15003 Nobis Tech Group
AS29761 Quadranet


While you're at it, you can knock down some rental proxies too. They are hard to figure, and their ranges often change, but here is an example sample from PacketFlip. You can block them however you like, easiest is in htaccess:

### PacketFlip
deny from 69.162.131.156
deny from 67.202.118.212
deny from 69.162.156.34
deny from 74.91.39.114
deny from 192.241.138.229
deny from 173.234.37.28
deny from 69.162.153.125
deny from 162.213.216.121
deny from 68.234.22.230
deny from 162.223.28.101
deny from 162.213.220.147
deny from 67.202.118.247
deny from 69.162.128.116
deny from 69.197.16.33
deny from 162.223.28.118
deny from 67.202.118.214
deny from 69.162.142.78
deny from 198.199.120.94
deny from 107.170.202.186
deny from 62.210.78.209
deny from 69.197.13.222
deny from 68.71.140.31
deny from 67.215.249.180
deny from 69.197.15.145
deny from 66.63.176.47
deny from 199.19.77.51
deny from 68.234.26.185
deny from 68.234.23.54
deny from 74.91.41.238
deny from 68.234.20.204
deny from 162.216.103.149
deny from 199.19.75.69
deny from 199.192.73.200
deny from 162.217.174.166
deny from 192.199.246.32
deny from 173.234.15.190
deny from 76.191.104.12
deny from 72.8.148.119
deny from 173.234.178.168
deny from 69.162.152.179
deny from 72.11.153.82
deny from 23.19.205.170
deny from 67.202.88.148
deny from 23.239.209.26
deny from 72.8.169.33
deny from 69.197.16.205
deny from 67.202.118.233
deny from 69.197.55.12
deny from 199.233.239.130
deny from 69.162.153.43
deny from 199.192.72.232
deny from 199.192.73.179
deny from 208.117.14.165
deny from 69.162.157.7
deny from 69.162.129.173
deny from 68.68.106.44
deny from 69.162.130.173
deny from 69.162.128.74
deny from 199.101.95.112
deny from 67.202.119.175
deny from 108.62.156.155
deny from 108.62.63.218
deny from 173.234.230.203
deny from 162.223.28.173
deny from 173.208.97.51
deny from 76.191.104.27
deny from 67.202.88.99
deny from 162.217.174.226
deny from 67.202.88.150
deny from 74.91.33.17
deny from 68.234.22.8
deny from 199.19.76.48
deny from 199.19.74.135
deny from 69.162.131.194
deny from 74.91.44.53
deny from 69.162.140.102
deny from 173.234.37.59
deny from 69.162.157.0
deny from 72.11.138.207
deny from 72.8.155.132
deny from 68.234.20.188
deny from 74.91.36.154
deny from 68.234.20.152
deny from 162.223.28.94
deny from 74.91.42.113
deny from 199.101.95.246
deny from 199.101.95.208
deny from 69.197.13.163
deny from 208.117.14.199
deny from 173.234.15.11
deny from 72.11.138.215
deny from 68.71.158.194
deny from 72.20.63.20
deny from 208.117.15.244
deny from 69.162.129.79
deny from 173.234.178.57
deny from 66.39.199.68
deny from 192.199.247.113
deny from 74.91.38.245
deny from 162.223.28.238
deny from 72.11.153.50
deny from 208.117.14.5
deny from 74.91.46.200
deny from 68.234.29.111
deny from 72.8.170.157


You can also block whole countries using htaccess like this. This is good for places like China, Russia, or Ukraine, which will offer you about 50 nefarious requests for every 1 legitimate one. Unless you need the traffic from those countries, block them. This tool is good for making GEO IP lists: www.ip2location.com/blockvisitorsbycountry.aspx
Now, you can take this further with broad blocking user agents. Although the nasties just use normal agents, there are plenty of others. The following list should not be used unless you want to broadly block tons of automation, bots, all crawlers, etc. So as you can see, you shouldn't use this at all unless it's an extreme case such as your question. Put this in the htaccess to activate it:

# BLOCK USER AGENTS
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} spyder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} crawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} easy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mega [NC,OR]
RewriteCond %{HTTP_USER_AGENT} collage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} check [NC,OR]
RewriteCond %{HTTP_USER_AGENT} seek [NC,OR]
RewriteCond %{HTTP_USER_AGENT} discover [NC,OR]
RewriteCond %{HTTP_USER_AGENT} navigate [NC,OR]
RewriteCond %{HTTP_USER_AGENT} capture [NC,OR]
RewriteCond %{HTTP_USER_AGENT} generat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} explor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} scout [NC,OR]
RewriteCond %{HTTP_USER_AGENT} accel [NC,OR]
RewriteCond %{HTTP_USER_AGENT} scrub [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} site [NC,OR]
RewriteCond %{HTTP_USER_AGENT} stream [NC,OR]
RewriteCond %{HTTP_USER_AGENT} commer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} walk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} worm [NC,OR]
RewriteCond %{HTTP_USER_AGENT} find [NC,OR]
RewriteCond %{HTTP_USER_AGENT} news [NC,OR]
RewriteCond %{HTTP_USER_AGENT} archive [NC,OR]
RewriteCond %{HTTP_USER_AGENT} book [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cyber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} commun [NC,OR]
RewriteCond %{HTTP_USER_AGENT} download [NC,OR]
RewriteCond %{HTTP_USER_AGENT} index [NC,OR]
RewriteCond %{HTTP_USER_AGENT} deep [NC,OR]
RewriteCond %{HTTP_USER_AGENT} engine [NC,OR]
RewriteCond %{HTTP_USER_AGENT} google [NC,OR]
RewriteCond %{HTTP_USER_AGENT} merch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bing [NC,OR]
RewriteCond %{HTTP_USER_AGENT} msn [NC,OR]
RewriteCond %{HTTP_USER_AGENT} microsoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} yahoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ask [NC,OR]
RewriteCond %{HTTP_USER_AGENT} jeev [NC,OR]
RewriteCond %{HTTP_USER_AGENT} answer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} blog [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mailto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} thumb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tineye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} shopwiki [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tweet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} xenu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mj12 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} majestic [NC,OR]
RewriteCond %{HTTP_USER_AGENT} magpie [NC,OR]
RewriteCond %{HTTP_USER_AGENT} baidu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pubsub [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ping [NC,OR]
RewriteCond %{HTTP_USER_AGENT} read [NC,OR]
RewriteCond %{HTTP_USER_AGENT} feed [NC,OR]
RewriteCond %{HTTP_USER_AGENT} rss [NC,OR]
RewriteCond %{HTTP_USER_AGENT} fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} seo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sem [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lead [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ahref [NC,OR]
RewriteCond %{HTTP_USER_AGENT} aihit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} validat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} parse [NC]
RewriteRule !^robots.txt$ - [F]

# BLOCK BLANK USER AGENTS
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^ - [F]


On top of all that blockage, if something gets through, and you still don't want the site to be indexed, robots.txt can help, but it's more or less a "treaty". You can add metas/headers in addition to robots.txt that further tells crawlers not to index, archive, cache, etc. The meta version goes in <head>:

<meta name="robots" content="noindex, nofollow, noimageindex, nosnippet, noodp, noarchive">

And the header version is called via your language or framework...heres an example using native header() in PHP:

header('X-Robots-Tag: noindex, nofollow, noimageindex, nosnippet, noodp, noarchive');

10% popularity Vote Up Vote Down


 

@Speyer207

In addition to the above, place a file in your root public_html folder called robots.txt containing this:

User-agent: *
Disallow: /

10% popularity Vote Up Vote Down


 

@Alves908

"Bad" traffic will ignore your robots and constantly spider you. It will always be changing ip addresses and agent combinations and that is a fact of life. You may be able to look into "honeypot" type solutions to stop bad repeat traffic that ignores your robots but it can become impractical to manage blocking it yourself unless you love the game "wack-a-mole".

In addition to authentication already mentioned, consider starting with one of the services that sit in front of your site giving you basic CDN and filtering out "bad" traffic based on community metrics. The two most popular ones that I am aware of are CloudFlare and Incapsula.

10% popularity Vote Up Vote Down


 

@Rivera981

You can trace the location of the spammer mostly they are from a group of IPs. Block those IPs.

For Google bots you cannot do anything. By using robots.txt it will disallow your whole domain or page for audience coming from search engines.

While submitting sitemap set the page change frequency to "NEVER". Surly it will reduce the bot traffic.

10% popularity Vote Up Vote Down


 

@Ogunnowo487

It sounds like the copyrighted content (the content that costs you money when someone visits that page) should be behind some kind of (free?) authentication/login? That would certainly help limit the number of visits to real visitors.

You could perhaps have a non-copyrighted snippet or summary that can be indexed by all that linked to the full article in order to capture the organic search traffic?

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme