: Why doesn't this web crawler obey requests to stop crawling my pages? I have a hidden page that verifies internal payments from PayPal. The page has the following in its HTTP headers: X-Robots-Tag:
I have a hidden page that verifies internal payments from PayPal. The page has the following in its HTTP headers:
X-Robots-Tag: noindex, nofollow
But once in a while I see this in the page logs:
HOST: 208-115-111-71-reverse.wowrack.com
USER_AGENT: Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)
What is that wowrack.com? And how do I stop it crawling my pages?
More posts by @Angie530
2 Comments
Sorted by latest first Latest Oldest Best
The reason is that your directive in the X-Robots-Tag is for indexation, not crawling.
[EDIT] Explicit reference to this point is made here: developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
This document details how Google handles the page-level indexing settings allow you to control how Google makes content available through search results. You can specify these by including a meta tag on (X)HTML pages or in an HTTP header.
(emphasis mine)
[/END EDIT]
To prevent crawling of this page, you should consider blocking this specific page using an exclusion in your robots.txt file. More information here: www.robotstxt.org/robotstxt.html
If this doesn't work (as not all crawlers respect this file), then you can look at blocking by IP or domain.
Several web crawlers don't respect the X-Robots-Tag in HTTP headers. I think this is the case for wowrack.com.
To block crawling of your page for this web crawler, you can use .htaccess (if you use Apache as a web server). Add these lines to block IP address in your .htaccess file:
order allow,deny
deny from 208.115.111.
allow from all
To be more aggressive, you can also block the hostname but it's not the better solution:
order allow,deny
deny from wowrack.com
allow from all
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.