Mobile app version of vmapp.org
Login or Join
Heady270

: Google is still crawling and indexing my old, dummy, test pages which now are 404 not found I have set up my site with sample pages and data (lorem ipsum, etc..) and Google has crawled these

@Heady270

Posted in: #Indexing

I have set up my site with sample pages and data (lorem ipsum, etc..) and Google has crawled these pages. I deleted all these pages and actually added real content but in webmaster tools, i still get a lot of 404 errors Google trying to crawl these pages. I have set them to "mark as resolved" but some pages still come back as 404.

Furthermore, I have a lot of these sample pages still listed when i do a search of my site on Google. How to remove them. I think these irrelevant pages are hurting my rating.

I actually wanted to erase all these pages and start getting my site being being indexed as a new one but I read it's not possible? (I have submitted a sitemap and used "Fetch as Google.")

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Heady270

3 Comments

Sorted by latest first Latest Oldest Best

 

@Nimeshi995

Okay. First things first. Do not mark your 404 as being fixed. You are actually prolonging the issue. Google will try and fetch a page that returns a 404 several times before giving up. This is because the 404 error indicates a temporary situation where a 410 error says the page is gone. So every time you mark a 404 as being fixed, you are in effect telling Google to try again thus starting the process of elimination all over again.

Just let these pages 404 for a while and Google will stop looking for them and will drop the pages from the index. It will take time, but short of a 410 error, this is the easiest way. A 410 error would make the process faster, but it is harder to present a 410 error and a 404 is the default making it the easier and natural solution.

Your removed pages will disappear in about 30-60 days if you can wait. It depends on how often Google visits your pages. It can take longer, but once 404's are found, Google likes to first spot check the site, then depending on how many 404's there are, may spider your site more aggressively.

Using a sitemap actually does not generally fix any problems with the index. It only makes life simpler for search engines. It is never taken as the be-all end-all list of pages any site has. If a search engine reads a sitemap and still finds pages not listed in the sitemap, it will continue to index those pages.

One option if it makes sense to do, is to list these pages in your robots.txt file. If there aren't too many (meaning something you can do and your robots.txt file would not be too long), that would be a faster solution. Otherwise, I would just wait and let the 404 errors expire on their own.

One last word. You will be okay. Really. It will all work out very well for you if you are patient.

10% popularity Vote Up Vote Down


 

@Ann8826881

Google is likely to continue trying to crawl these pages for a long time. Webmasters make mistakes, or sites become unavailable for whatever reason, so Google won't remove content at the first sign of a 404.

Alternatively you could serve a 410 Gone instead. This is a much stronger (ie. deliberate) signal that the page has literally "gone" and is not coming back. This could prompt Google to remove the page from the SERPs sooner.


I have set them to "mark as resolved" but some pages still come back as 404.


They are only "resolved" if you have put the page back. If you mark it as resolved and the page doesn't exist then the crawl error will simply recur. If the page doesn't exist then just leave it as it is.

Genuine 404's don't harm your search ranking. The 404 report in GWT is primarily for your benefit so you can see when things go wrong... when pages can't be found that should be found!

These irrelevant pages in the SERPs are perhaps a minor annoyance to your users, however, what are they searching for to find your lorem ipsum?

10% popularity Vote Up Vote Down


 

@BetL925

Once you publish a page, Google will never forget about it. I have sites from which I removed pages 15 years ago. Googlebot still comes back and checks those pages occasionally.

To prevent the pages from showing up in the search engine, your 404 errors will do the job. It may take Google a day to remove the page from the index after Googlebot crawls it next. If you want it removed faster, return a "410 Gone" status instead. Google removes 410 pages immediately after crawling them instead of waiting a day. Google doesn't remove 404 pages immediately to prevent web masters from shooting themselves in the foot as described by Matt Cutts:


So with 404s, along with I think 401s and maybe 403s, if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn't intended to be a page not found.


Another method you could consider is redirection. 301 redirecting an old page to a replacement will prevent it from showing up as an error in Google Webmaster Tools. This is only possible if there is some new page for each of the old pages. Redirecting all the test pages to your home page won't help, because Google considers redirects to the home page to be "soft 404" errors that will still show up in that report.

Having 404 errors in Webmaster Tools won't hurt you. Having some 404 errors on your site may even help you because it shows Googlebot that your site is configured correctly. Here is what Google's John Mueller (who works on Webmaster Tools and Sitemaps) has to say about 404 errors that appear in Webmaster tools:


HELP! MY SITE HAS 939 CRAWL ERRORS!!1

I see this kind of question several times a week; you’re not alone - many websites have crawl errors.


404 errors on invalid URLs do not harm your site’s indexing or ranking in any way. It doesn’t matter if there are 100 or 10 million, they won’t harm your site’s ranking. googlewebmastercentral.blogspot.ch/2011/05/do-404s-hurt-my-site.html
In some cases, crawl errors may come from a legitimate structural issue within your website or CMS. How you tell? Double-check the origin of the crawl error. If there's a broken link on your site, in your page's static HTML, then that's always worth fixing. (thanks +Martino Mosna)
What about the funky URLs that are “clearly broken?” When our algorithms like your site, they may try to find more great content on it, for example by trying to discover new URLs in JavaScript. If we try those “URLs” and find a 404, that’s great and expected. We just don’t want to miss anything important (insert overly-attached Googlebot meme here). support.google.com/webmasters/bin/answer.py?answer=1154698 You don’t need to fix crawl errors in Webmaster Tools. The “mark as fixed” feature is only to help you, if you want to keep track of your progress there; it does not change anything in our web-search pipeline, so feel free to ignore it if you don’t need it.
support.google.com/webmasters/bin/answer.py?answer=2467403 We list crawl errors in Webmaster Tools by priority, which is based on several factors. If the first page of crawl errors is clearly irrelevant, you probably won’t find important crawl errors on further pages.
googlewebmastercentral.blogspot.ch/2012/03/crawl-errors-next-generation.html There’s no need to “fix” crawl errors on your website. Finding 404’s is normal and expected of a healthy, well-configured website. If you have an equivalent new URL, then redirecting to it is a good practice. Otherwise, you should not create fake content, you should not redirect to your homepage, you shouldn’t robots.txt disallow those URLs -- all of these things make it harder for us to recognize your site’s structure and process it properly. We call these “soft 404” errors.
support.google.com/webmasters/bin/answer.py?answer=181708 Obviously - if these crawl errors are showing up for URLs that you care about, perhaps URLs in your Sitemap file, then that’s something you should take action on immediately. If Googlebot can’t crawl your important URLs, then they may get dropped from our search results, and users might not be able to access them either.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme