Mobile app version of vmapp.org
Login or Join
Angie530

: Why doesn't this web crawler obey requests to stop crawling my pages? I have a hidden page that verifies internal payments from PayPal. The page has the following in its HTTP headers: X-Robots-Tag:

@Angie530

Posted in: #HttpHeaders #WebCrawlers

I have a hidden page that verifies internal payments from PayPal. The page has the following in its HTTP headers:

X-Robots-Tag: noindex, nofollow


But once in a while I see this in the page logs:

HOST: 208-115-111-71-reverse.wowrack.com
USER_AGENT: Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)


What is that wowrack.com? And how do I stop it crawling my pages?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Angie530

2 Comments

Sorted by latest first Latest Oldest Best

 

@Rivera981

The reason is that your directive in the X-Robots-Tag is for indexation, not crawling.

[EDIT] Explicit reference to this point is made here: developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

This document details how Google handles the page-level indexing settings allow you to control how Google makes content available through search results. You can specify these by including a meta tag on (X)HTML pages or in an HTTP header.


(emphasis mine)

[/END EDIT]

To prevent crawling of this page, you should consider blocking this specific page using an exclusion in your robots.txt file. More information here: www.robotstxt.org/robotstxt.html

If this doesn't work (as not all crawlers respect this file), then you can look at blocking by IP or domain.

10% popularity Vote Up Vote Down


 

@BetL925

Several web crawlers don't respect the X-Robots-Tag in HTTP headers. I think this is the case for wowrack.com.

To block crawling of your page for this web crawler, you can use .htaccess (if you use Apache as a web server). Add these lines to block IP address in your .htaccess file:

order allow,deny
deny from 208.115.111.
allow from all


To be more aggressive, you can also block the hostname but it's not the better solution:

order allow,deny
deny from wowrack.com
allow from all

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme