: What is the best way to detect if the traffic is legit or (bot) not? We are planning to start program something similar to youtube where users get paid as per views. But problem we are

Posted in: #Malware #Privacy #WebApplications

: MediaWiki configuration - strange redirect behaviour after form POST I've recently setup a wiki using MediaWiki but am having the toughest time getting the ShortURL configuration working properly

Posted in: #Htaccess #Mediawiki #Php #Redirects

: Problem with Google Mobile Test - Blocking images I have a situation when I check my Drupal 7 website on Google Mobile Friendly Test, it says that my website is mobile friendly, but beneath

Posted in: #Google #RobotsTxt

: What's the SEO impact of using a shared CDN URL Say I have a Magento webshop with 4 different store-fronts and a European CDN which has a customized DNS lookup handler which localizes your

Posted in: #Cdn #Geolocation #Magento #Seo

Login to post a comment!

4 Comments

Sorted by latest first Latest Oldest Best

@Steve110

Find out how many videos viewed by normal users per hour.

Let's assume It is 10 videos viewed by most of visitors per hour.

Store & use the # of videos viewed using session + cookie + IP address.

If the number of videos viewed are higher than 10 for any session, then use simple captcha to avoid the bots.

If the captcha verification is failed ( continuously or many times) add these IP addresses in black list and treat them in a special way. I.e introduce more captcha in a incremental way.

You can find out the difference between bots and users easily using the combination of IP Address + Cookie + Session variables.

@Sims2060225

A few more methods of bot traffic detecting:

Verification of user agent (it is a client application that uses a
particular network protocol).
Looking for a highly specific match to something like a malware signature
or specific executable or C&C connection address.
Examining such behavioral parameters as a depth
of view, the duration of visit, engagement and some other
parameters.
Program solutions. There are two known by
me solutions to filter traffic and to examine its quality - Google
Analytics and Maxymizely.com. Use GA to filter hits from known
botnets and referrers. To filter bot and spider traffic from Google
Analytics, go to your Admin settings -> View Settings -> Bot
Filtering with a checkbox that reads ‘Exclude all hits from known
bots and spiders’. The second solution Maxymizely.com allows to
analyze traffic quality considering three dimensions - activity,
engagement and monetization of traffic. This tool allows to choose
the significant sub-parameters for every dimension and to assign
required weight to them. Also, there is a handy opportunity to
visualize this stuff in a view of 3D map.

@Annie201

Since you use tags that start with web, I assume you are building a system that requires HTML to be downloaded in order for a paid impression to work.

What you need to do is learn robot behaviour by looking at the server access log files. On a server with apache installed, the file is typically named access_log. If your server is used frequently, you will see hundreds, if not thousands of lines in it. Each line contains the IP address of the remote device that's connected to your server as well as the resource it requested along with the date and time.

Generally, there's at least one second between the time a person goes from one page to the next on a website unless of course the website is a poorly designed high-speed guessing game with no instructions, and even then, the speed might never be faster than one second due to initial connection latency.

If you see the same IP address listed along with the exact same date and time for the next 20+ lines straight, then its clearly a robot trying to attack your system (maybe trying to produce a denial of service attack).

Another way to check if its a robot is to look for the files being requested, especially in the error logs. If you constantly see a similar pattern of files over and over again, especially if none of them exist, then it means one or more systems may be trying to break into your server by assuming you have content management systems installed such as Wordpress.

Also, some robots may be requesting files with screwed up names, or may be falsely identifying themselves. Any line containing source code is likely a line coming from a robot. For example, code containing this: (:;){}

Because I'm unsure the name of the server (apache? nginx?) is serving the content on your system, I'm unable to tell you which column in the log files belong to the IP address or which belong to the file being requested etc., but when you look at them, especially after you make a request to the server, the patterns may become easier to discover.

@Alves908

Get a list of known referral spam URL's and filter them through Analytics.
megalytic.com/blog/how-to-filter-out-fake-referrals-and-other-google-analytics-spam