: How to not hurt advertisers AND comply with webmaster guidelines I'm looking for new ways to filter out bad bot traffic so that advertisers don't get harmed with artificial impressions. I look
I'm looking for new ways to filter out bad bot traffic so that advertisers don't get harmed with artificial impressions.
I look at the adsense support site at: support.google.com/adsense/answer/2660562 and it states:
Artificial impressions and clicks that are generated through automated means such as a bot or deceptive software, are prohibited. Automated traffic can be generated by a publisher, or can be received through purchased traffic. It’s important to review traffic sources before you decide to work with a traffic source. Also, be aware of programs that check links within your site, as they may click on ad links as well.
So I thought maybe its a good idea to make a ridiculously tiny, link located off-screen that a user would have a hard time clicking, and have it point to a special page that collects info about the robot including IP address as well as block the robot from further access to the site, but when I visit support.google.com/webmasters/answer/35769?hl=en and see text that includes:
Quality guidelines - specific guidelines
Avoid the following techniques:
Creating pages with little or no original content
Cloaking
Hidden text or links
It's trying to tell me such a link would violate webmaster guidelines because the resulting page the robot lands on will definitely qualify as thin content as it doesn't even contain two sentences, and cloaking may result because I don't want to block google from accessing the site if it accesses the link yet I want other bad bots blocked on the spot, and of course hidden text or links will be violated because I don't want random text visible for the user to see, and at any point, a child could access the site and be curious to find out what the link is all about even tho I may use a period alone as the anchor text.
So what would be the best approach here to eliminating bots and complying with webmaster guidelines and adsense without manually blocking a random set of IP addresses? Should I continue with my idea and make the hidden link for robots, or is there a better approach google will accept more?
More posts by @Annie201
1 Comments
Sorted by latest first Latest Oldest Best
First things first:
See if there is an existing tool that does what you want. It is possible that ModSecurity or another tool will do this for you. I do not have a list and searching on the web for one is not as easy as you may think, but there are several really good tools out there just for this purpose. I suggest taking this route first.
Please note that the Adsense quote is referring to any action you take. I have a set of templates that I can use that do not present any Google ads or analytics in the off chance that I need to pound my live site with a bot doing an audit. Normally, I keep a local copy of my site for that purpose taken directly from the live site. It does not refer to bots for which you do not have control. One way that people get into trouble is doing SEO or other site audits or buying traffic. Stay clear of these things except where you are acting against your site and can put in controls to prevent harm.
It is very possible to create an anti-bot mechanism that does not violate any of Googles rules, causes any problems with Adsense or Analytics, or confuses or harms a user in the least. It can be a simple extension of your site as it exists.
Your link does not have to be hidden nor does the page have to be thin.
Google does not slap sites for having one thin page. Nor does it slap people for bots that they have no control over. Google is fully aware of who these bots belong to and keeps track of bad bots and scraper sites. Google does not penalize you for activities that follow patterns they already see.
You do not necessarily have to have a link hidden or otherwise. You can use a JS bug or use PHP to collect information into a database and you will begin to see what bot traffic consists of. You can create a code based image too. As well, I suggest a link that is likely a user will not click to a restricted by robots.txt part of your site.
You will track:
Rogue accesses. (restricted by robots.txt)
Access speed.
Number of accesses per period of time.
Whether an image and possibly JS script is accessed.
Whether robots.txt is accessed.
You will also want to collect:
Agent name. (for those bots who are honest)
IP address.
Domain name.
You will have to figure out what an acceptable number of access and speed of accesses for your site. Access speed of .8 seconds is said to be human though due to network lag times these times can be as short as .4 seconds. You will want to track averages. You will want to determine what access speed average is acceptable as well as what is a normal number of pages for any given user. Bots will stick out like a sore thumb so there is a gray area in the data between users and bots that will keep you out of trouble.
You will want to track if robots.txt is accessed and whether it is obeyed. You may need to troll your log files for this. However, it is possible to use code to present your robots.txt instead of a file.
You will want to always get the IP address. If a domain name does not reverse to an IP address it may not be valid. Please be aware that invalid domain names are used as well as machine/host names. You want to store the IP address and domain name together for research.
Bad bots often come from non-subscriber IP address blocks though not exclusively. A subscriber block is a teleco. A non-subscriber block are often webhost IP address blocks. It is possible that teleco blocks are used from places like China and Russia. You can keep track of non-subscriber IP address blocks as they appear in a blacklist as well as a limited number of subscriber blocks. You should also create a whitelist too. Obviously you will want to put search engines into your whitelist.
Lastly, really bad bots never use a valid Agent name. However, unwanted but honest bots do. Do not take any action on Agent names except for known bots that are honest as to who they are. Agent names are so incredibly unreliable that I totally ignore them for automated decisions except for those unwanted bots that are honest.
One warning: You will want to capture information only at first to get a feel for your site and do not block any accesses for a period. From there, you will want to be cautious when blocking accesses while you feel out what accesses are acceptable. This is not a task for the faint at heart and you will be researching and writing code. Keep the system simple and think it through.
You may want to create a page that Google will never see that bad bots will and you want to completely deny access to your site. A 404 error is a good alternative too. One option is to offer a redirect off site. My favorite advice is to redirect back to the domain name or IP address that is accessing your site. Just make sure that for any page you present that it does not have Google Adsense or Analytics as not to skew results or cause heart-burn with advertisers. This can be easily done with a relatively naked HTML page.
You will find a few benefits to creating a system such as this.
Spam links to your site will reduce over time.
Scraper site pages will reduce where that is a problem.
Some bad bots will completely stop accessing your site.
Your Google Analytics will become cleaner.
Your Google Adsense impressions may go down and the CTR rate may
improve.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.