: 404's from nonexistent URL I've been having some indexing issues that I've had a development team working on trying to fix for a week but no progress has been made. The site has an overwhelming
I've been having some indexing issues that I've had a development team working on trying to fix for a week but no progress has been made.
The site has an overwhelming amount of 404 errors as indicated by Google Search Console. This site is about 800 pages but there are almost 1300 404 errors. This site is built on WordPress.
All of said errors are existent on the Desktop portion of GSC but not on the Smartphone portion.
The 404's are all from pages that I have no recollection of ever existing and all follow the same format /URLPath/index.html.
One of these is errors is explicitly /2014/12/index.html. The page /2014/12 exists as do all of the other pages mentioned in the 404 list as long as you drop the /index.html that appends to the end of the URL.
Investigating I can see that the URL was crawled earlier in May and that this page is being linked from 5 other URLS (using the page mentioned above). Of these 5 URLS, none of them link to /2014/12/index.html or /2014/12/ and no changes have been made to this page since it was published.
A similar theme occurs for all of the other 404 errors and there linked from pages.
Is it possible that these crawl errors are contributing to my indexing problem?(A Site: search shows Google has over 1400 pages indexed but there are only about 800)
Why would these pages, appending /index.html, be created without me creating them?
NOTE:
All of these errors appeared over a 2 day period at the start of May.
This is happening for every tag & category as well /tag/index.html, /category/index.html, /category/helloWorld/index.html, etc.
This website used to be www and now uses the non-www version of the site.
A sitemap was submitted at the end of May with only the 800-ish pages that exist and only 98 are recorded as index by GSC.
EDIT:
I looked into the server logs to see who accessed the URLs that append /index.html but the entire server log is empty. There's no trace of anyone visiting a page that 404s.
More posts by @Eichhorn148
1 Comments
Sorted by latest first Latest Oldest Best
Based on your edit there is no cause for concern. 404 errors are very common in the search console simply due to the fact that someone somewhere has linked to that page. It could also be the Googlebot having a glitch trying to append index.html to the end of the URL. This is also a common issue due to many URL rewriting configurations leaving out the file extension which under default Apache configurations means each and every page is within a folder and the files are all named index.html and are simply in different folders to denote the site structure. For a static HTML site this is no issue as that is how it may be setup but with a rewritten URL such as what wordpress does this can cause issues only in the sense that you will see a large number of 404 errors. Basically there is no need to worry about it and it isn't going to affect your SERP rankings.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.