Mobile app version of vmapp.org
Login or Join
Shelley277

: Crawl Error on Disallowed Content I've got a rule in my robots.txt file: # Crawlers Setup User-agent: * # Directories Disallow: /my_directory/ Yet I am getting entries in my Crawl Errors section

@Shelley277

Posted in: #GoogleSearchConsole #HttpCode500 #RobotsTxt

I've got a rule in my robots.txt file:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /my_directory/


Yet I am getting entries in my Crawl Errors section of Google Webmaster Tools for this URL:


Googlebot couldn't access the contents of this URL because the server had an internal error when trying to process the request. These errors tend to be with the server itself, not with the request. More info.


I've even tested manually entering the URL in the robots.txt Tester in Google Webmaster Tools and it returned "Blocked" as expected for the exact URL that is being reported as an error in Crawl Errors.

How can I resolve this?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Shelley277

2 Comments

Sorted by latest first Latest Oldest Best

 

@Lee4591628

Robots.txt does not block search engines from indexing a link, only the content of that link. So what is likely happening is that Googlebot is visiting your page even though it knows it can't index the contents of that page, only the url.

From Google Webmasters:


Blocking Google from crawling a page is likely to decrease that page's ranking or cause it to drop out altogether over time. It may also reduce the amount of detail provided to users in the text below the search result. This is because without the page's content, the search engine has much less information to work with.

However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed.

10% popularity Vote Up Vote Down


 

@Sherry384

Robots.txt blocks only content, not URL. So, if server response on the "blocked" URL is wrong, Google can get it.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme