Mobile app version of vmapp.org
Login or Join
Fox8124981

: Bandwidth saturation due to Googlebot search for spam pages I recently had to update the hosting plan of a WordPress site I run due to bandwidth excess, which was already quite high. I've

@Fox8124981

Posted in: #Bandwidth #Googlebot #Spam

I recently had to update the hosting plan of a WordPress site I run due to bandwidth excess, which was already quite high.
I've installed a security plugin that keeps log of 404 errors. From here I see a large amount of accesses to spam pages from an IP that results being from a Googlebot.
I've disallowed all bots to crawl my pages and that seems to stop those bulk access to non existing pages. Of course this is a temporary solution to avoid that my site is being blocked. How can I stop Googlebot from trying to access only those pages?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Fox8124981

2 Comments

Sorted by latest first Latest Oldest Best

 

@Yeniel560

If the URLs they are requesting follow some patterns, I think it is a good idea to block those pages. Based on your comment above, in your robots.txt you could add:

User-agent: *
Disallow: /data/


That will work as long as you don't have any real URLs beginning with data.

The second thing you may want to do is decrease Google's crawl rate. You can do this in Google Webmaster Tools.

There is also a Crawl-delay setting in robots.txt - Google ignores this but it is useful for other search engines. I'd advise not setting it higher than 4.

10% popularity Vote Up Vote Down


 

@Sherry384

404 errors are a part of business. They can come from poorly made links on other sites, current link on your own site, or from pages that have been removed without a current existing link.

You do not want to block the search engines from accessing pages that do not exist unless you do not want to be in the search engine at all.

If you are blocking Google, for example, using a robots.txt file, you will be delisted. You are saying to Google, I do not want what you are offering.

You do want to allow 404 errors to occur. How else can Google know these pages are gone? You can replace a 404 error with a 410 which would shorten the process. A 404 means temporarily gone while a 410 means gone. Google will try and access pages that return a 404 error for a period of time for a certain number of accesses before it determines the page is truly gone. If you return a 410 error, Google will stop trying to access the page after one access. But keep in mind that any search engine may try again if they find a link on another site just in case.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme