Yet I am getting entries in my Crawl Errors section

Posted in: #GoogleSearchConsole #HttpCode500 #RobotsTxt

I've got a rule in my robots.txt file:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /my_directory/

Yet I am getting entries in my Crawl Errors section of Google Webmaster Tools for this URL:

Googlebot couldn't access the contents of this URL because the server had an internal error when trying to process the request. These errors tend to be with the server itself, not with the request. More info.

I've even tested manually entering the URL in the robots.txt Tester in Google Webmaster Tools and it returned "Blocked" as expected for the exact URL that is being reported as an error in Crawl Errors.

How can I resolve this?

10.02% popularity Vote Up Vote Down

: Transferring authority/position from current site's landing pages to another site We just finished a website which provides insightful information about a particular industry from A to Z and it

@Shelley277

Posted in: #Seo #Serps

1 Comments

: Track 301 redirect traffic from domainA to domainB in Google analytics Ive got domainA which has a 301 redirect setup to redirect traffic from domainA to domainB, Id like to track how much

@Shelley277

Posted in: #301Redirect #GoogleAnalytics #Referrer #Tracking

3 Comments

: SEO - resize single image with CSS or generate thumbnails from that image I have 4 sizes for a single image in a page of my eCommerce website. 600x600px , 350x350px , 220x220px , 110x110px

@Shelley277

Posted in: #Images #Seo #Thumbnail

2 Comments

: Country specific domain with duplicate content on an international domain I have a site that is currently hosted at a ccTLD (.co.uk). The audience for this site is international but largely based

@Shelley277

Posted in: #Cctld #DuplicateContent #TopLevelDomains

1 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Lee4591628

Robots.txt does not block search engines from indexing a link, only the content of that link. So what is likely happening is that Googlebot is visiting your page even though it knows it can't index the contents of that page, only the url.

From Google Webmasters:

Blocking Google from crawling a page is likely to decrease that page's ranking or cause it to drop out altogether over time. It may also reduce the amount of detail provided to users in the text below the search result. This is because without the page's content, the search engine has much less information to work with.

However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed.

10% popularity Vote Up Vote Down

@Sherry384

Robots.txt blocks only content, not URL. So, if server response on the "blocked" URL is wrong, Google can get it.

10% popularity Vote Up Vote Down

Feed

: Crawl Error on Disallowed Content I've got a rule in my robots.txt file: # Crawlers Setup User-agent: * # Directories Disallow: /my_directory/ Yet I am getting entries in my Crawl Errors section

More posts by @Shelley277

: Transferring authority/position from current site's landing pages to another site We just finished a website which provides insightful information about a particular industry from A to Z and it

: Track 301 redirect traffic from domainA to domainB in Google analytics Ive got domainA which has a 301 redirect setup to redirect traffic from domainA to domainB, Id like to track how much

: SEO - resize single image with CSS or generate thumbnails from that image I have 4 sizes for a single image in a page of my eCommerce website. 600x600px , 350x350px , 220x220px , 110x110px

: Country specific domain with duplicate content on an international domain I have a site that is currently hosted at a ccTLD (.co.uk). The audience for this site is international but largely based

Login to post a comment!

2 Comments

Back to top | Use Dark Theme