Mobile app version of vmapp.org
Login or Join
Kristi941

: Googlebot is still attempting to crawl old content I have content that was deleted several years ago and from time to time Googlebot still attempts to access those pages, filling up my logs

@Kristi941

Posted in: #Googlebot

I have content that was deleted several years ago and from time to time Googlebot still attempts to access those pages, filling up my logs with lots of 404, making the 'real' problems harder to find and to read.

I have found Google is still crawling and indexing my old, dummy, test pages which now are 404 not found already, but that question is more about removing pages from the index. My pages are no longer indexed, but I'd like Google to stop attempting to open them.

I also believe this is not related to How to effectively close a page?. The page was closed years ago (maybe we did it wrong back then), but I would like to prevent Google from still crawling those old URL's (they have already been removed from the index years ago).

Would a 410 work? Indicating that the content will not come back? Or is this something we have no control over?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Kristi941

2 Comments

Sorted by latest first Latest Oldest Best

 

@Holmes151

Better yet: 301 to the correct page.

Edit: since this forum is gone, 301 to a page that explains that the forum has been permanently shut down. That's because external links still point to interior pages of that dead forum. That's not your fault, but on the other hand you sure enjoyed collecting all that free link juice. Surely this has happened to you: you click a link that seems perfect, only to land at some jerk's homepage and go "WTH, this stinks." Back up, look at the URLs and find out the jerk destroyed the content you needed. And in that moment, who are you angry at? Bingo.

As far as Google mistaking that for doorways, no worries - people shut down forums all the time, and that's certainly better than leaving them up to be stuffed full of spam by robots. I don't know if your platform allows this, but the HTTP protocol allows you to serve a 404 that is both a redirect and an actual web page: include a Location: in the HTTP header, and an HTML header/body with the usual redirects.

The crawling continues because external sites still link to the old page location; or Google is aware of that page being bookmarked by users. That is absolutely free link juice that you have earned! If it brings enough traffic to be worth the bother, and the content is any good, restore the forum in archive mode.

Even if you 410, that doesn't mean Google will disappear from the logs forever; as long as those external links and bookmarks live, the Free Traffic Fairy (er, Google) will check up periodically to see if the URLs have Lazarus'd.

As far as the access log pollution, grep -v ’/forum-URL-pattern-here/.*404' Though you may want to know about organic traffic to those pages; it is worth money.

10% popularity Vote Up Vote Down


 

@Alves908

Yes you are absolutely correct. You need to throw 410 to indicate that you have permanently disabled the page from your site.

404 page does not indicate that to crawler. Assume that there is some temporary issue on your site due to which 404 is shown, you don't want crawler to remove ur pages from index in that case.

Also make sure you remove these pages from sitemap, also from any other linked pages + disallow them from robots.txt.

You can find a good fix by seeing from where your pages which are throwing 404 are linked to. You can identify those resources and remove the page links from there in addition to throwing 410.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme