: Handling bots that request URLs in paths In my server log, I found at least one IP address to be requesting a full URL in an awkward place. For example, the header the client sends to my
In my server log, I found at least one IP address to be requesting a full URL in an awkward place. For example, the header the client sends to my server is this:
GET www.3rdpartysite.com/file.php HTTP/1.1
And here, I'm expecting request headers to be more like this:
GET /path/to/file.php HTTP/1.1
Host: example.com
This makes me think hackers are trying to break my website, but then I look here at www.w3.org/Protocols/rfc2616/rfc2616-sec5.html and it talks about that first GET request being valid for proxies.
My server has cpanel and whm installed but I don't use proxies for my website. My question then is, if I force apache to return an error or redirect to all HTTP request headers beginning with...
GET
...and I request remote systems to issue headers in this format....
GET /path/to/resource HTTP/x.x
Host: example.com
would my idea work with all web browsers? or would at least one legit web browser break?
I just have a feeling some hacker is using my server to connect to another.
More posts by @LarsenBagley505
2 Comments
Sorted by latest first Latest Oldest Best
The HTTP 1.1 spec is very clear that
GET /path/to/resource HTTP/1.1
Host: example.com
and
GET example.com/path/to/resource HTTP/1.1
are equivalent requests. This is because the request starts with Request-Line which is defined as Method-Token Request-URI Protocol-Version and the Request-URI can be absolute: "*" | absoluteURI | abs_path | authority.
You should not try to configure your web server to respond differently to the different formats of requests. You would be breaking the spec. While browsers today typically use the former request format, there is no guarantee that they will continue to do so in the future. You don't want your website to suddenly stop working with the latest version of some browser.
You should instead ensure that your server does not serve content for unknown hosts. A request for any third party site should return a 404 not found (or possibly even 400 bad request). Bots that request third party sites are typically testing for open proxy servers.
One way to configure your web server to do so is to configure the first (default) virtual host to return a 404 page. Every legitimate site would be in a later virtual host directive.
These happen all the time. I see this at least a dozen times a day in my server logs. Best bet is to block the connection from coming in at the firewall or gateway and that way it doesn't hit your server, otherwise if it isn't a big deal for you and isn't causing you too many hassles and you aren't seeing other errors in relation to this connection then you can pretty safely ignore it.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.