: 301 redirect to 404 page or set status code to 404 and stay on page? I have a number of pages on my website that only administrators can access and access to these pages is given if a
I have a number of pages on my website that only administrators can access and access to these pages is given if a querystring value if found and correctly set. For example:
www.mydomain.com/show-daily-statistics?key=abc
The above link will show the content of the page but anything else such as the below will not:
www.mydomain.com/show-daily-statistics
Now I was thinking about what to do if search engines and/or non-admin users somehow land on these hidden pages.
I can of course either change the status code of the page to 404 or else 301 redirect to:
www.mydomain.com/404-error
What's the best solution in respect to Google and SEO?
More posts by @Lee4591628
4 Comments
Sorted by latest first Latest Oldest Best
I'd use a noindex,nofollow,noarchive tag in the head of the pages you want to get out of search.
I've found that the noarchive tag tends to get things out of search pretty damn quick, whereas the noindex may stop it getting into search, but if its already out there, then you need to flush it out of search results.
As for the admin access question, the other guys here have already given some advice on security that I'd recommend checking out.
The semantically correct HTTP response code for this situation would be 403 Forbidden:
The server understood the request, but is refusing to fulfill it.
Authorization will not help and the request SHOULD NOT be repeated.
If the request method was not HEAD and the server wishes to make
public why the request has not been fulfilled, it SHOULD describe the
reason for the refusal in the entity. If the server does not wish to
make this information available to the client, the status code 404
(Not Found) can be used instead.
(Although the definition of the 403 response says that "authorization will not help", IMO this should be understood as referring specifically to HTTP Basic / Digest authentication, for which the status code 401 Unauthorized should be used instead. Since you're not using either of those authentication methods, 403 is the appropriate status code in your case.)
However, using a 403 status code reveals (or at least strongly implies) the fact that there is a page with that URL, even though the server is refusing to deliver it. As this is something that you may wish to conceal from potential intruders, the HTTP/1.1 standard explicitly allows the 404 Not Found status code to be returned instead (emphasis mine):
The server has not found anything matching the Request-URI. No
indication is given of whether the condition is temporary or
permanent. The 410 (Gone) status code SHOULD be used if the server
knows, through some internally configurable mechanism, that an old
resource is permanently unavailable and has no forwarding address.
This status code is commonly used when the server does not wish to
reveal exactly why the request has been refused, or when no other
response is applicable.
Of course, to make such concealment effective at all, the 404 error page you return needs to appear identical to what you return for actual non-existent pages. Otherwise, it will only fool the dumbest and most casual attackers. (If your goal is just to keep the pages out of Google's index, a 403 response will do that just as well.)
What about the other possible responses suggested in your question and the other answers?
As I noted earlier, I do not believe that a 401 response is appropriate here. It may work in practice, insofar as most browsers and search engines will treat any malformed or unrecognized 4xx series response code as if it were a 404, but it's still not valid according to the HTTP spec, and there's no practical reason to prefer it over 403 or 404.
As for using a 301 (or 302) redirect to a separate "404 error" page, that's an awful practice spread by sloppy mod_rewrite tutorials, and has absolutely no redeeming features as compared to returning a 404 response directly:
It's confusing to visitors, as the URL they were trying to visit gets replaced by the URL of the error page. Thus, they see a message saying they've reached a non-existent page, but no easily visible indication of what the page they were trying to visit was, and so cannot easily attempt any recovery strategies like fixing any obvious typos in the URL, or copy-and-pasting it into Google or the Wayback Machine.
It may confuse search engines, especially if your 404 page is disallowed in robots.txt, or if it incorrectly returns a 200 OK response instead of a real 404 status code ("soft 404"), potentially causing your 404 page to appear in search results for random search terms.
It causes (a small amount of) extra load on your servers, increases the response time to visitors and potentially slows down search engines crawling your site, as every request for a non-existent (or concealed) page now involves an extra HTTP round-trip.
It has no SEO benefit, as any "link juice" from pages redirected to a 404 page is lost anyway.
(Of course, the one situation where you do want to use a 301 redirect instead of a 404 response is when the page actually has moved, and you can redirect the visitor to its correct location. But that's not the case discussed here.)
Finally, I would like to echo the sentiment, expressed in many comments here, that merely "hiding" your admin pages like this is not an adequate substitute for proper password-based authentication. That said, if you already have a secure authentication system set up, hiding the pages may be useful as an extra layer, albeit a fairly weak one, in a defense in depth approach.
The correct code would be 401 Not Authorized
As per the HTTP specifications
10.4.2 401 Unauthorized
The request requires user authentication. The response MUST include a
WWW-Authenticate header field (section 14.47) containing a challenge
applicable to the requested resource. The client MAY repeat the
request with a suitable Authorization header field (section 14.8). If
the request already included Authorization credentials, then the 401
response indicates that authorization has been refused for those
credentials. If the 401 response contains the same challenge as the
prior response, and the user agent has already attempted
authentication at least once, then the user SHOULD be presented the
entity that was given in the response, since that entity might include
relevant diagnostic information. HTTP access authentication is
explained in "HTTP Authentication: Basic and Digest Access
Authentication" [43].
or alternatively
10.4.4 403 Forbidden
The server understood the request, but is refusing to fulfill it.
Authorization will not help and the request SHOULD NOT be repeated. If
the request method was not HEAD and the server wishes to make public
why the request has not been fulfilled, it SHOULD describe the reason
for the refusal in the entity. If the server does not wish to make
this information available to the client, the status code 404 (Not
Found) can be used instead.
Both of these are semantically more correct than 404. The resource exists so 404 isnt' correct. 401 should be correct, but you aren't requiring authentication. Security by obscurity isn't security. 403 is also correct as the request is understood, the resource exists it is just refuses to service the request. 404 is appropriate if you don't want to reveal why 403 is happening.
In any case 301 redirects are not appropriate, the resource hasn't moved.
Since this is a page for administrators, with or without the "key" parameter, the pages can't and should not be indexed. Therefore the webpage for non-admin can send 404 status code, and you can leave the same URL intact. Do not redirect, since you tell Google that the page has moved, but then to a page that doesn't exist.
This is how Google does it as well. See what happens when you go to a dummy page: www.google.com/analytics/asdsas
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.