: The best approach will most likely involve getting the IP address of the visitor to the page, performing a reverse NS lookup, and checking if the domain name matches the known list of web
The best approach will most likely involve getting the IP address of the visitor to the page, performing a reverse NS lookup, and checking if the domain name matches the known list of web crawlers. As far as I know, this is pretty much foolproof (discounting DNS spoofing which is unlikely to be a major problem).
For the Google web crawler, this is described in the blog post How to verify Googlebot.
Here's a list of the domain name wildcards for the most common spider bots/web crawlers:
Google (Googlebot): *.googlebot.com
Bing (msnbot): (Not resovable, see IP ranges)
Yahoo (Yahoo Slurp): *.yahoo.com
Though I'm not sure how often the IP address ranges for the various main crawlers, there's also this page which lists such ranges for the three main search engines.
(Note: I believe the bots do set the user-agent HTTP header on requests, but this is very easy to fake of course.)
Hope this helps.
More posts by @Yeniel560
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.