Mobile app version of vmapp.org
Login or Join
Berryessa370

: What’s up with this 302 redirect to the same URL? I was scanning my website for links that point to HTTP 301/302/303 redirects when I found a puzzling behavior of the curl utility. Consider

@Berryessa370

Posted in: #302Redirect #Http

I was scanning my website for links that point to HTTP 301/302/303 redirects when I found a puzzling behavior of the curl utility. Consider the output from the following three commands:

$ curl -I jekyllrb.com HTTP/1.1 302 Found
Connection: close
Pragma: no-cache
cache-control: no-cache
Location: /

$ curl -I jekyllrb.com/ HTTP/1.1 302 Found
Connection: close
Pragma: no-cache
cache-control: no-cache
Location: /

$ curl -LI jekyllrb.com/ HTTP/1.1 200 OK
Server: GitHub.com
Date: Tue, 30 Dec 2014 01:31:47 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 8177
Last-Modified: Mon, 22 Dec 2014 14:17:25 GMT
Expires: Tue, 30 Dec 2014 01:41:47 GMT
Cache-Control: max-age=600
Vary: Accept-Encoding
Accept-Ranges: bytes


I understand the output from the first command: if you request the “naked” domain you will be redirected to the path “/”. But when you actually request the indicated path you seem to be redirected to the URL you just requested!

Then, when you add the -L option to tell curl to follow redirects, it looks as if you’re taken directly to the real page without any intermediate steps! (Usually when curl follows redirects it prints a set of headers for each request—if there had been an HTTP 302 in there somewhere then it should have been shown before the HTTP 200.)

Can anyone explain to me (1) why the redirect to the same URL is valid; and (2) why including the -L flag seems not to follow, but instead to completely bypass the redirect?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Berryessa370

2 Comments

Sorted by latest first Latest Oldest Best

 

@Murphy175

Wow, that is weird, indeed.

As the other answer has mentioned, a trailing slash after the domain name is always mandatory in HTTP requests, so, it is NOT expected to receive a redirect if you try to access the domain "without" any specific path in place.

I've tried up your domain as curl -v jekyllrb.com, and indeed only on the third attempt did I actually get 200 OK and the page content, instead being given 302 Found on the prior two requests.




My guess would be that doing a couple of redirects to itself is seamless for standard-behaving clients, but might be problematic for some broken bots, and GitHub might be using it as a technique to avoid some email harvesters or some such? Perhaps similar in concept to greylisting in SMTP?
Another potential benefit I could think of might be the speed of fulfilling individual requests -- perhaps it doesn't have the content of the page upfront, so, it issues a redirect to itself instead, ensuring that each individual request is very fast.
Or perhaps it is to minimise bandwidth use by giving a redirect to non-browser clients which might not bother to make a second request? Just another far-off guess.




Cns:cnst {8395} curl -v jekyllrb.com; date
* About to connect() to jekyllrb.com port 80 (#0)
* Trying 192.30.252.153...
* connected
* Connected to jekyllrb.com (192.30.252.153) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.26.0
> Host: jekyllrb.com
> Accept: */*
>
< HTTP/1.1 302 Found
< Connection: close
< Pragma: no-cache
< cache-control: no-cache
< Location: /
<
* Closing connection #0
Tue Dec 30 10:57:35 PST 2014
Cns:cnst {8396} curl -v jekyllrb.com ; date
* About to connect() to jekyllrb.com port 80 (#0)
* Trying 192.30.252.153...
* connected
* Connected to jekyllrb.com (192.30.252.153) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.26.0
> Host: jekyllrb.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: GitHub.com
< Date: Tue, 30 Dec 2014 18:57:36 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 8177
< Last-Modified: Mon, 22 Dec 2014 14:17:25 GMT
< Expires: Tue, 30 Dec 2014 19:07:36 GMT
< Cache-Control: max-age=600
< Accept-Ranges: bytes
<
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>Jekyll &bull; Simple, blog-aware, static sites</title>

10% popularity Vote Up Vote Down


 

@Megan663

Your first two requests are the same. All clients (including curl) have to send the slash after the domain name as part of the HTTP request whether or not it is on the URL. There is no way to formulate a valid HTTP request without it. A minimal HTTP request is:

GET / HTTP/1.0
host: jekyllrb.com


Omitting the slash from that will result in a "400 bad request" error.

It appears that the behavior is intermittent. Sometimes the server responds with a redirect and sometimes it does not. I've tried it dozens of times myself with curl and I occasionally get the redirect, but most of the time I do not.

Redirecting to the same URL is occasionally used to set cookies and test to see that they are set. I don't see any cookies being set on this request, but curl may retry a request that redirects to the same URL with the -L option.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme