Mobile app version of vmapp.org
Login or Join
Annie201

: Should I return a http 401 status code on an html based login form? Should I return a http 401 status code on an html based login form? The page is a dedicated login form, and does not

@Annie201

Posted in: #SearchEngines #Standards

Should I return a http 401 status code on an html based login form? The page is a dedicated login form, and does not have any other meaningful content, just site framework. The URL however, can be for a page that does have meaningful content, but requires login. Note that this setup only returns the status code 401, and doesn't prompt the user for basic authentication.

Looking at the standards it seems that 401 is an inappropriate status code for html based login forms. However, I have never experienced or heard of any ill consequences of doing so.

When sending 401, "The response MUST include a WWW-Authenticate header field (section 14.47) containing a challenge applicable to the requested resource."

requirement mentioned here:
tools.ietf.org/html/rfc2616#section-10.4.2
detailed here:
tools.ietf.org/html/rfc2617#section-3.2.1
I know there are ways that I can work around search engines in convincing them to index, or not, pages based on the presence of a login form, but I'd prefer to use http status codes, specifically 401 since it's definition seems like a perfect match if not for the WWW-Authenticate header requirement.

Is there any reason why I shouldn't use 401 in this case? Semantically is there any difference between not being Authorised at the http level versus being Authorised at the application level? Obviously you can have both, but isn't authentication at the http level just for ease of not implementing it at the application level?

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Annie201

2 Comments

Sorted by latest first Latest Oldest Best

 

@Carla537

As you note, RFC 2616 requires that a 401 response be accompanied by an RFC 2617 WWW-Authenticate header. I suppose you could technically comply with that requirement by sending a bogus header like:

WWW-Authenticate: Bogus realm="blahblah", comment="use form to log in"


but I have no idea what browsers will do if presented with a 401 response containing no challenges they understand. I would assume that most if not all of them would present the request body to the user (as RFC 2616 says they should do if authentication fails), but neither RFC seems to say so explicitly, so they might legitimately just show a generic error message instead.

A possible alternative (if you don't want to just use a 200 response like everyone else seems to do) would be to use a 403 Forbidden status code. This is a widely used response code, and as far as I know, almost all interactive user agents (i.e. browsers, as opposed to, say, search engines or download managers) should react to it by presenting the content to the user, at least if it's long enough.

Although the description of the 403 status code says that "[a]uthorization will not help", this should IMO be understood in context as referring to RFC 2617 authentication or similar protocol-level authorization mechanisms; as far as the browser is concerned, it has no idea whether submitting a form and receiving a cookie in response counts as "authorization" or something else.

One more commonly used mechanism would be to respond to unauthenticated requests with a temporary redirect to a separate login page, with the original URL passed as a parameter so that the user can be redirected back to it after successful authentication. However, note that a naive implementation could allow a malicious person to craft a login link that would redirect the user to an arbitrary URL after logging in. If this could be a security issue, you should take steps to prevent it, e.g. by only accepting return URLs matching a known safe pattern, or by protecting the return URL with a message authentication code to prevent modification.

In any case, if you're using HTTP cookies to store authentication tokens after login, you should include a Vary header in your responses (both before and after authentication) to prevent inappropriate caching, as in Vary: Cookie.

10% popularity Vote Up Vote Down


 

@Jamie184

First, if the page need login, then you probably should block it by robots.txt

Second, if robots do reach the page, a 401 error is proper.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme