Mobile app version of vmapp.org
Login or Join
Alves908

: Grapeshot crawler ignoring robots.txt Has anyone come across a crawler called Grapeshot? They are hammering the same page repeatedly on our website. I believe they are looking for ad related keywords,

@Alves908

Posted in: #RobotsTxt #Spam #WebCrawlers

Has anyone come across a crawler called Grapeshot? They are hammering the same page repeatedly on our website. I believe they are looking for ad related keywords, based on previous content ad campaigns. The odd thing is we never ran any such campaigns on the page they are so interested in. We do have only a few pages running AdSense, is this what has attracted Grapeshot?

I've added the following declaration to my robots.txt, but they don't seem to be honouring it?

User-agent: grapeshot
Disallow: /


Any ideas on how to block this nuisance crawler? I'm starting to think the best way is by setting up IP rules in IIS?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Alves908

2 Comments

Sorted by latest first Latest Oldest Best

 

@Ogunnowo487

The Grapeshot crawler should honor your robots.txt, as it’s documented on their site:


With a robots.txt files you may block the Grapeshot Crawler from parts or all of your site […]


Maybe it’s not the real Grapeshot crawler visiting your site? You could check the IP address:


The Grapeshot crawler can be identified by requests coming from Grapeshot owned IP address ranges, if you are suspicious about requests being spoofed you should first check the IP address of the request against the appropriate RIPE database, using a suitable whois tool or lookup service. In general the only valid addresses you should be seeing are in the address range 89.145.95.0 to 89.145.95.255 (89.145.95.0/24). At time of writing the only addresses in use for Grapeshot crawlers are 89.145.95.2, 89.145.95.41 and 89.145.95.42.


If it’s the real crawler, and you gave it a few days (so the crawler notices your changed robots.txt), you should contact the crawler support.

10% popularity Vote Up Vote Down


 

@YK1175434

Several bots don't follow robots.txt declarations. You need to block the user-agent with your server and return 403 Forbidden HTTP response.

On IIS, you can block a user-agent with your server. You can follow this procedure on moz.com: moz.com/ugc/blocking-bots-based-on-useragent
I didn't explain the procedure here because it would be too long.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme