: Grapeshot crawler ignoring robots.txt Has anyone come across a crawler called Grapeshot? They are hammering the same page repeatedly on our website. I believe they are looking for ad related keywords,

Posted in: #RobotsTxt #Spam #WebCrawlers

Has anyone come across a crawler called Grapeshot? They are hammering the same page repeatedly on our website. I believe they are looking for ad related keywords, based on previous content ad campaigns. The odd thing is we never ran any such campaigns on the page they are so interested in. We do have only a few pages running AdSense, is this what has attracted Grapeshot?

I've added the following declaration to my robots.txt, but they don't seem to be honouring it?

User-agent: grapeshot
Disallow: /

Any ideas on how to block this nuisance crawler? I'm starting to think the best way is by setting up IP rules in IIS?

10.02% popularity Vote Up Vote Down

: Server gzip compression vs gzipped sitemap My client's SEO guy said that search engines prefer gzipped sitemaps. They keep their sitemap in .xml.gz file. I was wondering if there is any difference

@Alves908

Posted in: #Gzip #SearchEngines #Seo #Sitemap

1 Comments

: Help Google to recognise forum threads I have seen some cases where Google search results can actually tell you how many posts/replies there are in a topic, how many pages there are, and I

@Alves908

Posted in: #GoogleSearch #Seo

2 Comments

: Google Analytics - TOS section pertaining to privacy The Google Analytics terms of service does do not allow to track "data that personally identifies an individual (such as a name, email address

@Alves908

Posted in: #GoogleAnalytics

2 Comments

: On which page(s) to add canonical? I have two pages with same content and same meta title and meta description. They also have very simular URL: http://www.mysite.com/new-york http://www.mysite.com/new_york

@Alves908

Posted in: #CanonicalUrl #GoogleSearchConsole #Seo

5 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Ogunnowo487

The Grapeshot crawler should honor your robots.txt, as it’s documented on their site:

With a robots.txt files you may block the Grapeshot Crawler from parts or all of your site […]

Maybe it’s not the real Grapeshot crawler visiting your site? You could check the IP address:

The Grapeshot crawler can be identified by requests coming from Grapeshot owned IP address ranges, if you are suspicious about requests being spoofed you should first check the IP address of the request against the appropriate RIPE database, using a suitable whois tool or lookup service. In general the only valid addresses you should be seeing are in the address range 89.145.95.0 to 89.145.95.255 (89.145.95.0/24). At time of writing the only addresses in use for Grapeshot crawlers are 89.145.95.2, 89.145.95.41 and 89.145.95.42.

If it’s the real crawler, and you gave it a few days (so the crawler notices your changed robots.txt), you should contact the crawler support.

10% popularity Vote Up Vote Down

@YK1175434

Several bots don't follow robots.txt declarations. You need to block the user-agent with your server and return 403 Forbidden HTTP response.

On IIS, you can block a user-agent with your server. You can follow this procedure on moz.com: moz.com/ugc/blocking-bots-based-on-useragent
I didn't explain the procedure here because it would be too long.

10% popularity Vote Up Vote Down

Feed

: Grapeshot crawler ignoring robots.txt Has anyone come across a crawler called Grapeshot? They are hammering the same page repeatedly on our website. I believe they are looking for ad related keywords,

More posts by @Alves908

: Server gzip compression vs gzipped sitemap My client's SEO guy said that search engines prefer gzipped sitemaps. They keep their sitemap in .xml.gz file. I was wondering if there is any difference

: Help Google to recognise forum threads I have seen some cases where Google search results can actually tell you how many posts/replies there are in a topic, how many pages there are, and I

: Google Analytics - TOS section pertaining to privacy The Google Analytics terms of service does do not allow to track "data that personally identifies an individual (such as a name, email address

: On which page(s) to add canonical? I have two pages with same content and same meta title and meta description. They also have very simular URL: http://www.mysite.com/new-york http://www.mysite.com/new_york

Login to post a comment!

2 Comments

Back to top | Use Dark Theme