Mobile app version of vmapp.org
Login or Join
Sims2060225

: Google crawling old non existing pages I noticed in my Webmaster Tools account that Google is crawling old site URLs that no longer exist. Can anybody tell me where GoogleBot is getting these

@Sims2060225

Posted in: #Seo

I noticed in my Webmaster Tools account that Google is crawling old site URLs
that no longer exist.

Can anybody tell me where GoogleBot is getting these URLs?

10.04% popularity Vote Up Vote Down


Login to follow query

More posts by @Sims2060225

4 Comments

Sorted by latest first Latest Oldest Best

 

@Sent6035632

I've asked this questions many times over the years, the short answer is, Googlebot never forgets a URL it's seen, and if you're properly passing back a 301, 404, or 410 response code you shouldn't have to worry any further than that.

The longer answer is, Google collects URLs from several different sources. These URLs get fed to Googlebot. If the URL ever responds with a 200 response code, you can bet that Googlebot will comeback for years to come even if that URL has been 301'd, 404'd, or 410'd. Basically Google allocates a certain daily amount of bandwidth and URLs to a site. When they feel they have additional bandwidth or they haven't tried some old URLs in a while, they'll actively crawl some of your really old stuff.

On a few older sites I've worked on, the site has gone through 2-3 redesigns and platform changes. This means changing from extensions like .aspx to .php or having a completely new URL structure. Without fail, Googlebot will continually request old .php URLs that have been 301'd or 410'd for years.

Unless the Googlebot traffic is effecting your server, I would just ignore those old URL requests. Just make sure you have the appropriate server response that you want to convey to Googlebot (301, 404, or 410 typically). If it's causing unnecessary load on your server, you can always robots.txt out the URLs.

10% popularity Vote Up Vote Down


 

@Goswami781

basically, if the old urls changed to new urls, you should do a 301 redirect from the old url to the new url.

If the old urls are simply not there anymore (gone, obsolete, whatever), then do a 410 status response on them. That tells Google that the pages are gone and not to bother checking for them in the future (it'll still check for them once in a long while, but the effect is the same)

10% popularity Vote Up Vote Down


 

@Pope3001725

Once Google crawls a URL it is added to its index and will be continually crawled until given a reason to stop. 404 headers will eventually result in Google stop crawling those pages but what you should have done is 301 redirects from the old URLs tot he new URLs. That would have have told Google those pages have moved and to update their index. It would have also made any links pointing to those old pages associated to their new URLs. Basically by not doing the 301 redirects you started over from scratch as far as SEO goes.

10% popularity Vote Up Vote Down


 

@Murray432

If the links still exist on other sites then Google will continue to look for them on your site, there isn't much you can do to stop it if you can't remove the links.

The only thing that you can do to stop Google looking for dead content is add the code below to your htaccess file, this 410 gone message will let Google know the content is gone for good.

Google may not stop looking but at least it will stop generating 404's each time it does look.
#Stuff to 410

Redirect gone /path/to/page.html
Redirect gone /directory-path/
Redirect gone foo.domain.com


Otherwise if you want Google to look for the content elsewhere, follow @JohnConde 's advice on 301 redirects.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme