Mobile app version of vmapp.org
Login or Join
Looi9037786

: Is there a good way to filter all the robots and domains they are on? I have been working more on the bot filter for my website, but by no means is it complete. So far I have the main

@Looi9037786

Posted in: #Logging #WebCrawlers

I have been working more on the bot filter for my website, but by no means is it complete.

So far I have the main ones:

Google,
Yahoo,
MSN,
Baidu,
Amazon,
and a few others...

Right now, I am using a filter to compare the referral URL, existing domain, and known browsers vs. non-browser useragents.

Are there any other good techniques to detect if the hit is coming from a bot or not?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Looi9037786

1 Comments

Sorted by latest first Latest Oldest Best

 

@XinRu657

Are there any other good techniques to
detect if the hit is coming from a bot
or not?


It depends upon which variety of bot you're hunting - here are a few tips for isolating malicious bots:


Look for hits on non-existent (or
restricted access) administrative
scripts, e-mail scripts, et cetera
Look for nearly-instantaneous
retrieval of your site content
Look for repeated hits on your feeds
(particularly if you know that your
content is being used on scraper
sites)


You might also check out user-agent.org for some of the more obscure search bot agents you can expect to see and review the Where can I find a list of search engine crawler user agents and their domain names? thread started by Dev a few days ago.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme