: Blocking all search engines except the big ones I would like to somehow be able to block all search engines except Google, Yahoo & Bing (and their related sites like Google Images) from
I would like to somehow be able to block all search engines except Google, Yahoo & Bing (and their related sites like Google Images) from crawling my site as they consume a lot of server and bandwidth but don't bring any traffic.
Is this easily done or difficult? It would be good if someone maintained a list of small search engines that could be pasted into a robots.txt file to block them.
Also, I realize I cannot block crawlers that ignore the robots.txt or sites from surreptitiously scraping and crawling, but that is not what I want. I just want to block all the Altavistas, Hotbots, Lycos (do these even still exist) and the university experiment crawlers from wasting my time.
More posts by @Murphy175
2 Comments
Sorted by latest first Latest Oldest Best
For the ones that don't follow the rules you can try to find them in your logs and then block them by IP.
Generally you can spot a bot by the fact that it reads the pages too fast to be human.
How big of an issue is it really?
The bots you should be concerned about are the bots that don't follow the rules and who pretend to be regular visitors.
Search Engine traffic is legit and as Dan pointed out Google also started as a small university project. It isn't really fair to discriminate against the small guys, and possibly not smart in the long run.
Kinopiko's answer will work, and Google's webmaster tools will let you create and test your robot.txt (Site configuration, Crawler Access), but I think that if traffic from genuine search engines is a problem for you, it may be that your current hosting solution is not a good deal.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.