Mobile app version of vmapp.org
Login or Join
Angela700

: Thousands of 404 errors in Google Webmaster Tools Because of a former error in our ASP.Net application, created by my predecessor and undiscovered for a long time, thousands of wrong URLs where

@Angela700

Posted in: #Google #GoogleSearchConsole #Links

Because of a former error in our ASP.Net application, created by my predecessor and undiscovered for a long time, thousands of wrong URLs where created dynamically. The normal user did not notice it, but Google followed these links and crawled itself through these incorrect URLs, creating more and more wrong links.

To make it clearer, consider the url


example.com/folder


should create the link


example.com/folder/subfolder


but was creating


example.com/subfolder


instead. Because of bad url rewriting, this was accepted and by default showed the index page for any unknown url, creating more and more links like this.


example.com/subfolder/subfolder/....


The problem is resolved by now, but now I have thousands of 404 errors listed in the Google Webmaster Tools, which got discovered 1 or 2 years ago, and more keep coming up.

Unfortunately the links do not follow a common pattern that I could deny for crawling in the robots.txt.

Is there anything I can do to stop google from trying out those very old links and remove the already listed 404s from Webmaster Tools?

10.06% popularity Vote Up Vote Down


Login to follow query

More posts by @Angela700

6 Comments

Sorted by latest first Latest Oldest Best

 

@Eichhorn148

Here is what Google's John Mueller (who works on Webmaster Tools and Sitemaps) has to say about 404 errors that appear in Webmaster tools:


HELP! MY SITE HAS 939 CRAWL ERRORS!!1

I see this kind of question several times a week; you’re not alone - many websites have crawl errors.


404 errors on invalid URLs do not harm your site’s indexing or ranking in any way. It doesn’t matter if there are 100 or 10 million, they won’t harm your site’s ranking. googlewebmastercentral.blogspot.ch/2011/05/do-404s-hurt-my-site.html
In some cases, crawl errors may come from a legitimate structural issue within your website or CMS. How you tell? Double-check the origin of the crawl error. If there's a broken link on your site, in your page's static HTML, then that's always worth fixing. (thanks +Martino Mosna)
What about the funky URLs that are “clearly broken?” When our algorithms like your site, they may try to find more great content on it, for example by trying to discover new URLs in JavaScript. If we try those “URLs” and find a 404, that’s great and expected. We just don’t want to miss anything important (insert overly-attached Googlebot meme here). support.google.com/webmasters/bin/answer.py?answer=1154698 You don’t need to fix crawl errors in Webmaster Tools. The “mark as fixed” feature is only to help you, if you want to keep track of your progress there; it does not change anything in our web-search pipeline, so feel free to ignore it if you don’t need it.
support.google.com/webmasters/bin/answer.py?answer=2467403 We list crawl errors in Webmaster Tools by priority, which is based on several factors. If the first page of crawl errors is clearly irrelevant, you probably won’t find important crawl errors on further pages.
googlewebmastercentral.blogspot.ch/2012/03/crawl-errors-next-generation.html There’s no need to “fix” crawl errors on your website. Finding 404’s is normal and expected of a healthy, well-configured website. If you have an equivalent new URL, then redirecting to it is a good practice. Otherwise, you should not create fake content, you should not redirect to your homepage, you shouldn’t robots.txt disallow those URLs -- all of these things make it harder for us to recognize your site’s structure and process it properly. We call these “soft 404” errors.
support.google.com/webmasters/bin/answer.py?answer=181708 Obviously - if these crawl errors are showing up for URLs that you care about, perhaps URLs in your Sitemap file, then that’s something you should take action on immediately. If Googlebot can’t crawl your important URLs, then they may get dropped from our search results, and users might not be able to access them either.

10% popularity Vote Up Vote Down


 

@Goswami781

This may not have been true when the question was originally asked, but now through Webmaster tools you can pick which URLs which result in 404s Google should remove from its index and not try to crawl again. You can do 25 at a time. You can find this facility under Health > Crawl Errors.

10% popularity Vote Up Vote Down


 

@Deb1703797

Block those pages with robots.txt, that's the easiest route.

My site has over 100k 404 errors that don't seem to die. Sometimes you just have to leave them be.

10% popularity Vote Up Vote Down


 

@Ravi8258870

If you run a script to display the pages, you can detect that this is a problematic page and print a true html page with a 200 status + meta tag :

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

10% popularity Vote Up Vote Down


 

@Cofer257

Webmaster Tools is notoriously slow at updating the links/errors page. In particular, even when a page is no longer linked to, Googlebot keeps requesting the page and reporting that it cannot be found.

If any of the URLs follow a common pattern you can do a 301 redirect to the correct page, which should speed up Google's removal of those errors. (Note: I wouldn't recommend adding thousands of lines to htaccess because that can seriously impact performance.)

Aside from that there isn't much you can do unfortunately besides wait it out. If there are definitely no links pointing to the non-existent pages then the Crawl Errors section will slowly shrink over time. It can take up to 3 months in my experience.

Note this isn't the case for external links - on my sites I have several 404 errors coming from external links I have no control over and I don't think they will ever disappear.

10% popularity Vote Up Vote Down


 

@RJPawlick198

Does your 404 page return a true 404 or does it return a 200 with 404 content? I see a lot of custom 404 pages that say "page not found" but return a 200 status so Google thinks they are active pages and keeps them in their index.

Without having access to the pages to look them over it's hard to tell exactly what is going on but that seems to be the most common issue in my experience.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme