: Will banning bots make a site harder to find on search engines? I'm running Apache 2 and a large part of our page views come from bots. Most of those are legitimate ones such as Google and
I'm running Apache 2 and a large part of our page views come from bots. Most of those are legitimate ones such as Google and Bing.
I want to parse the logs and get accurate statistics as to how many human visitors I get, so I've temporarily updated robots.txt to ban bots on all pages. I know this is only partially effective at preventing bot access, but I'm OK with that.
How will disallowing bots affect users searching for this site? Will it prevent users from finding the page on Google?
More posts by @Rivera981
7 Comments
Sorted by latest first Latest Oldest Best
Since your stated intention is to
get accurate statistics as to how many human visitors I get,
The approriate solution is to use a service like Google Analytics, or New Relic, once signed up you insert a snippet of javascript into your page (many engines such as wordpress can do this automatically or with a plugin) which sends information to the monitoring service. Once set up such a service will give a wealth of information about your visitors. Google Analytics is quite amazing in the detail with which it tracks user interactions with your site.
These services are implemented in such a way as to only track real humans and it would be folly to try and re-implement what they already do so well, and they are so useful it's almost folly to not use such a service.
The correct answer is to not mess with robots.txt and instead parse your logs, looking at the User-Agent header as mentioned in the comments. Google, Yahoo, etc. should identify themselves as bots using this header, and disallowing bots via robots.txt would be like driving a truck through your search engine ranking. As @adria said, there are tools out there that can do this for you. A very popular one is Google Analytics, here's how they handle crawler traffic.
The data from server logs is limited, and will unavoidably have a high noise to signal ratio, thanks to factors such as bots, caching, CDN.
Analyzing page views is a task for page-tag based analytics.
Banning bots is a fruitless activity. The only bots that will obey robots.txt are helpful bots like Googlebot and Bingbot. Malicious bots or even less scrupulous search services' bots will ignore your robots.txt.
Banning bots is only a sure way to lose all page ranking with the major search providers AND your logs will still be full of bot traffic.
It is likely to make your site very difficult or impossible to find in search engines, as the search engines won't send their robots to see what's on your site. They won't know what words you use so it will be hard for them to tell what searches your site might be relevant to.
However it is possible your site will still be displayed in search results, particularly if a high-ranking site has a link to your site. Google and possibly other engines may use information from the link alone to decide to show your site in their results pages.
Google may still crawl pages ignored by robots.txt and may even list them see Block URLs with robots.txt and Does Google ignore robots.txt
Banning bots will not let any search engine get the content of the site.
Ultimately you will not rank for any keywords. It would be next to impossible to find your page on Google. You might get referral traffic but no organic traffic.
Note: Robots.txt does not ban bots but ask them not to index and crawl the site. Which major search engine bot like Google, Yahoo & Bing follow.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.