: What is the best way to detect if the traffic is legit or (bot) not? We are planning to start program something similar to youtube where users get paid as per views. But problem we are
We are planning to start program something similar to youtube where users get paid as per views.
But problem we are facing is that people will game the system and send traffic using botnets to gain undue advantage.
I know there might be no 100% legit way to uncount such traffic but what is the best way to filter the traffic?
More posts by @Cugini213
4 Comments
Sorted by latest first Latest Oldest Best
Find out how many videos viewed by normal users per hour.
Let's assume It is 10 videos viewed by most of visitors per hour.
Store & use the # of videos viewed using session + cookie + IP address.
If the number of videos viewed are higher than 10 for any session, then use simple captcha to avoid the bots.
If the captcha verification is failed ( continuously or many times) add these IP addresses in black list and treat them in a special way. I.e introduce more captcha in a incremental way.
You can find out the difference between bots and users easily using the combination of IP Address + Cookie + Session variables.
A few more methods of bot traffic detecting:
Verification of user agent (it is a client application that uses a
particular network protocol).
Looking for a highly specific match to something like a malware signature
or specific executable or C&C connection address.
Examining such behavioral parameters as a depth
of view, the duration of visit, engagement and some other
parameters.
Program solutions. There are two known by
me solutions to filter traffic and to examine its quality - Google
Analytics and Maxymizely.com. Use GA to filter hits from known
botnets and referrers. To filter bot and spider traffic from Google
Analytics, go to your Admin settings -> View Settings -> Bot
Filtering with a checkbox that reads ‘Exclude all hits from known
bots and spiders’. The second solution Maxymizely.com allows to
analyze traffic quality considering three dimensions - activity,
engagement and monetization of traffic. This tool allows to choose
the significant sub-parameters for every dimension and to assign
required weight to them. Also, there is a handy opportunity to
visualize this stuff in a view of 3D map.
Since you use tags that start with web, I assume you are building a system that requires HTML to be downloaded in order for a paid impression to work.
What you need to do is learn robot behaviour by looking at the server access log files. On a server with apache installed, the file is typically named access_log. If your server is used frequently, you will see hundreds, if not thousands of lines in it. Each line contains the IP address of the remote device that's connected to your server as well as the resource it requested along with the date and time.
Generally, there's at least one second between the time a person goes from one page to the next on a website unless of course the website is a poorly designed high-speed guessing game with no instructions, and even then, the speed might never be faster than one second due to initial connection latency.
If you see the same IP address listed along with the exact same date and time for the next 20+ lines straight, then its clearly a robot trying to attack your system (maybe trying to produce a denial of service attack).
Another way to check if its a robot is to look for the files being requested, especially in the error logs. If you constantly see a similar pattern of files over and over again, especially if none of them exist, then it means one or more systems may be trying to break into your server by assuming you have content management systems installed such as Wordpress.
Also, some robots may be requesting files with screwed up names, or may be falsely identifying themselves. Any line containing source code is likely a line coming from a robot. For example, code containing this: (:;){}
Because I'm unsure the name of the server (apache? nginx?) is serving the content on your system, I'm unable to tell you which column in the log files belong to the IP address or which belong to the file being requested etc., but when you look at them, especially after you make a request to the server, the patterns may become easier to discover.
Get a list of known referral spam URL's and filter them through Analytics.
megalytic.com/blog/how-to-filter-out-fake-referrals-and-other-google-analytics-spam
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.