: Search engines that index your site despite a disallow in robots.txt Recently, WordPress changed their options from "Block search engines" to "Discourage search engines" So that leaves me wondering,
Recently, WordPress changed their options from "Block search engines" to "Discourage search engines"
So that leaves me wondering, that rewording must be that some search engines disregard your decision about indexing or not your website.
Is there a list of those somewhere?
More posts by @Ann8826881
2 Comments
Sorted by latest first Latest Oldest Best
I believe that the robot.txt file is just like asking the search engine not to index you.
There is no actual disabling of indexing, it is up to the search engines not to index you.
Even Google will occasionally include a site in its search index, despite the fact that the site disallows all crawling with robots.txt. Google views robots.txt rules as preventing crawling, but not necessarily inclusion in the search index.
When Googlebot can't crawl a site, but the site has lots of external links, Google may include pages from the site in the index and use the anchor text of the links to determine what the page is about.
Google's Matt Cutts blogged about this. He gave an example of the Department of Motor Vehicles in California. Its website disallowed crawling, but it would be ridiculous if people couldn't find it on Google. So they came up with a way of including sites like that without being able to crawl them.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.