: Why is Google reporting CSV files as "soft 404"? I have a few hundred soft 404 errors reported in Google Search Console. Almost all of them are for CSV files containing data. For example
I have a few hundred soft 404 errors reported in Google Search Console. Almost all of them are for CSV files containing data. For example here is the HTTP response for one of them:
HTTP/1.1 200
Content-Disposition: attachment; filename="fewer-bank-failures.csv"
Content-Length: 116
Content-Type: text/csv; name="fewer-bank-failures.csv";charset=UTF-8
Date: Thu, 01 Feb 2018 11:32:56 GMT
Server: Apache
Connection: keep-alive
"",Bank Failures
2000,2
2001,4
2002,11
2003,3
2004,4
2005,0
2006,0
2007,3
2008,25
2009,140
2010,157
2011,92
2012,51
Why is Google reporting that this is a soft 404? I've usually seen soft 404 because:
You have a "200 OK" status but say "not found" in the page
You redirect to the home page
The page is blank
I can't figure out why Google would think that this CSV file would indicate a not found error.
I do understand other reasons that Google might not want to index this content:
It is a download attachment rather then a page
CSV wouldn't be the best landing page experience
The content is duplicate -- we have the an HTML page with the same data including a graph.
I would expect Google to choose not to index the page for one of those reasons, but I am completely surprised that they call it a "soft 404".
What can I do to tell Google that the page is real? Would using a Link: <https://example.com/fewer-bank-failures.html>; rel="canonical" HTTP header help?
More posts by @Eichhorn148
1 Comments
Sorted by latest first Latest Oldest Best
Well, it does kinda fall within the realms of Google's definition of a soft 404 (highlighting my own):
A soft 404 means that a URL on your site returns a page telling the user that the page does not exist and also a 200-level (success) code to the browser. (In some cases, instead of a "not found" page, it might be a page with little or no usable content -- for example, a sparsely populated or empty page.)
So, from that "definition" you can't really say it's not a soft 404.
What can I do to tell Google that the page is real?
The .csv file is a "real page"?
To be honest I wouldn't think you'd be wanting these CSV files to be indexed anyway; or were you? I would have thought that blocking with robots.txt would have been the way to go (as @closetnoc suggested in comments)?
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.