Mobile app version of vmapp.org
Login or Join
Ann8826881

: Search engines that index your site despite a disallow in robots.txt Recently, WordPress changed their options from "Block search engines" to "Discourage search engines" So that leaves me wondering,

@Ann8826881

Posted in: #Indexing #RobotsTxt #SearchEngines #Seo #WebCrawlers

Recently, WordPress changed their options from "Block search engines" to "Discourage search engines"



So that leaves me wondering, that rewording must be that some search engines disregard your decision about indexing or not your website.

Is there a list of those somewhere?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Ann8826881

2 Comments

Sorted by latest first Latest Oldest Best

 

@Kaufman445

I believe that the robot.txt file is just like asking the search engine not to index you.
There is no actual disabling of indexing, it is up to the search engines not to index you.

10% popularity Vote Up Vote Down


 

@Heady270

Even Google will occasionally include a site in its search index, despite the fact that the site disallows all crawling with robots.txt. Google views robots.txt rules as preventing crawling, but not necessarily inclusion in the search index.

When Googlebot can't crawl a site, but the site has lots of external links, Google may include pages from the site in the index and use the anchor text of the links to determine what the page is about.

Google's Matt Cutts blogged about this. He gave an example of the Department of Motor Vehicles in California. Its website disallowed crawling, but it would be ridiculous if people couldn't find it on Google. So they came up with a way of including sites like that without being able to crawl them.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme