Mobile app version of vmapp.org
Login or Join
Candy875

: When webservers send a page, why don't they send all required CSS, JS, and images without being asked? When a web page contains a single CSS file and an image, why do browsers and servers

@Candy875

Posted in: #Http #Performance #Resources #Webserver

When a web page contains a single CSS file and an image, why do browsers and servers waste time with this traditional time-consuming route:


browser sends an initial GET request for the webpage and waits for server response.
browser sends another GET request for the css file and waits for server response.
browser sends another GET request for the image file and waits for server response.


When instead they could use this short, direct, time-saving route?


Browser sends a GET request for a web page.
The web server responds with (index.html followed by style.css and image.jpg)

10.06% popularity Vote Up Vote Down


Login to follow query

More posts by @Candy875

6 Comments

Sorted by latest first Latest Oldest Best

 

@Kaufman445

Because, in your example, web server would always send CSS and images regardless if the client already has them, thus greatly wasting bandwidth (and thus making the connection slower, instead of faster by reducing latency, which was presumably your intention).
Note that CSS, JavaScript and image files are usually sent with very long expire times for exactly that reason (as when you need to change them, you just change the file name to force new copy which will again get cached for a long time).

Now, you can try to work around that wasting of bandwidth by saying "OK, but client could indicate that it already has some of that resources, so server would not send it again". Something like:

GET /index.html HTTP/1.1
Host: example.com If-None-Match: "686897696a7c876b7e"
Connection: Keep-Alive

GET /style.css HTTP/1.1
Host: example.com If-None-Match: "70b26618ce2c246c71"

GET /image.png HTTP/1.1
Host: example.com If-None-Match: "16d5b7c2e50e571a46"


And then get only the files that haven't changed get sent over one TCP connection (using HTTP pipelining over persistent connection). And guess what? It is how it already works (you could also use If-Modified-Since instead of If-None-Match).



But if you really want to reduce latency by wasting lots of bandwidth (as in your original request), you can do that today using standard HTTP/1.1 when designing your website. The reason most people don't do it is because they don't think it is worth it.

To do it, you do not need to have CSS or JavaScript's in separate file, you can include them in main HTML file by using <style> and <script> tags (you probably do not even need to do it manually, your template engine can probably do it automatically). You can even include images in the HTML file using data URI, like this:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />


Of course, base64 encoding increases the bandwidth usage slightly, but if you don't care about wasted bandwidth, that should not be an issue.

Now, if you really cared, you could even make you web scripts smart enough to get best of both worlds: on first request (user does not have a cookie), send everything (CSS, JavaScript, images) embedded just in one single HTML file as described above, add a link rel="prefetch" tags for external copies of the files, and add a cookie. If the user already has a cookie (eg. he has visited before), then send him just a normal HTML with <img src="example.jpg">, <link rel="stylesheet" type="text/css" href="style.css"> etc.

So on first visit the browser would request just a single HTML file and get and show everything. Then it would (when idle) preload specified external CSS, JS, images. The next time user visits, browser would request and get only changed resources (probably just new HTML).

The extra CSS+JS+images data would only ever be sent twice, even if you clicked hundreds of times on the website. Much better than hundreds of times as your proposed solution suggested. And it would never (not on the first time, nor on the next times) use more than one latency-increasing round-trip.

Now, if that sounds like too much work, and you don't want to go with another protocol like SPDY, there are already modules like mod_pagespeed for Apache, which can automatically do some of that work for you (merging multiple CSS/JS files into one, auto-inlining small CSS and minifiying them, make small placeholder inlined images while waiting for originals to load, lazy loading images etc.) without requiring that you modify single line of your webpage.

10% popularity Vote Up Vote Down


 

@Gretchen104

Your web browser doesn't know about the additional resources until it downloads the web page (HTML) from the server, which contains the links to those resources.

You might be wondering, why doesn't the server just parse its own HTML and send all the additional resources to the web browser during the initial request for the web page? It's because the resources might be spread across multiple servers, and the web browser might not need all those resources since it already has some of them cached, or may not support them.

The web browser maintains a cache of resources so it does not have to download the same resources over and over from the servers that host them. When navigating different pages on a website that all use the same jQuery library, you don't want to download that library every time, just the first time.

So when the web browser gets a web page from the server, it checks what linked resources it DOESN'T already have in the cache, then makes additional HTTP requests for those resources. Pretty simple, very flexible and extensible.

A web browser can usually make two HTTP requests in parallel. This is not unlike AJAX - they are both asynchronous methods for loading web pages - asynchronous file loading and asynchronous content loading. With keep-alive, we can make several requests using one connection, and with pipelining we can make several requests without having to wait for responses. Both of these techniques are very fast because most overhead usually comes from opening/closing TCP connections:





A bit of web history...

Web pages started as plain text email, with computer systems being engineered around this idea, forming a somewhat free-for-all communication platform; web servers were still proprietary at the time. Later, more layers were added to the "email spec" in the form of additional MIME types, such as images, styles, scripts, etc. After all, MIME stands for Multi-Purpose Internet Mail Extension. Sooner or later we had what is essentially multimedia email communication, standardized web servers, and web pages.


HTTP requires that data be transmitted in the context of email-like
messages, although the data most often is not actually email.


As technology like this evolves, it needs to allow developers to progressively incorporate new features without breaking existing software. For example, when a new MIME type is added to the spec - let's say JPEG - it will take some time for web servers and web browsers to implement that. You don't just suddenly force JPEG into the spec and start sending it to all web browsers, you allow the web browser to request the resources that it supports, which keeps everyone happy and the technology moving forward. Does a screen reader need all the JPEGs on a web page? Probably not. Should you be forced to download a bunch of Javascript files if your device doesn't support Javascript? Probably not. Does Googlebot need to download all your Javascript files in order to index your site properly? Nope.

Source: I've developed an event-based web server like Node.js. It's called Rapid Server.

References:


HTTP persistent connection (keep-alive)
HTTP pipelining
HTTP/1.1 Connections (RFC 2616)


Further reading:


Is SPDY any different than http multiplexing over keep alive connections
Why HTTP/2.0 does not seem interesting

10% popularity Vote Up Vote Down


 

@Gloria169

Because it doesn't assume that these things are actually required.

The protocol doesn't define any special handling for any particular type of file or user-agent. It does not know the difference between, say, an HTML file and a PNG image. In order to do what you're asking, the Web server would have to identify the file type, parse it out to figure out what other files it's referencing, and then determine which other files are actually needed, given what you intend to do with the file. There are three big problems with this.

The first problem is that there is no standard, robust way to identify file types on the server end. HTTP manages via the Content-Type mechanism, but that doesn't help the server, which has to figure this stuff out on its own (partly so that it knows what to put into the Content-Type). Filename extensions are widely supported, but fragile and easily-fooled, sometimes for malicious purposes. Filesystem metadata is less fragile, but most systems don't support it very well, so the servers don't even bother. Content sniffing (as some browsers and the Unix file command try to do) can be robust if you're willing to make it expensive, but robust sniffing is too expensive to be practical on the server side, and cheap sniffing isn't robust enough.

The second problem is that parsing a file is expensive, computationally-speaking. This ties into the first one somewhat, in that you'd need to parse the file in a bunch of different potential ways if you wanted to sniff the content robustly, but it also applies after you've identified the file type, because you need to figure out what the references are. This isn't so bad when you're only doing a few files at a time, like the browser does, but a Web server has to handle hundreds or thousands of requests at once. This adds up, and if it goes too far, it can actually slow things down more than multiple requests would. If you've ever visited a link from Slashdot or similar sites, only to find that the server is agonizingly slow due to high usage, you've seen this principle in action.

The third problem is that the server has no way to know what you intend to do with the file. A browser might need the files being referenced in the HTML, but it might not, depending on the exact context in which the file is being executed. That would be complex enough, but there's more to the Web than just browsers: between spiders, feed aggregators, and page-scraping mashups, there are many kinds of user-agents that have no need for the files being referenced in the HTML: they only care about the HTML itself. Sending these other files to such user-agents would only waste bandwidth.

The bottom line is that figuring out these dependencies on the server side is more trouble than it's worth. So instead, they let the client figure out what it needs.

10% popularity Vote Up Vote Down


 

@Shelley277

HTTP2 is based on SPDY and does exactly what you suggest:


At a high level, HTTP/2:


is binary, instead of textual
is fully multiplexed, instead of ordered and blocking
can therefore use one connection for parallelism
uses header compression to reduce overhead
allows servers to “push” responses proactively into client caches



More is available on HTTP 2 Faq

10% popularity Vote Up Vote Down


 

@BetL925

The short answer is "Because HTTP wasn't designed for it".

Tim Berners-Lee did not design an efficient and extensible network protocol. His one design goal was simplicity. (The professor of my networking class in college said that he should have left the job to the professionals.) The problem that you outline is just one of the many problems with the HTTP protocol. In its original form:


There was no protocol version, just a request for a resource
There were no headers
Each request required a new TCP connection
There was no compression


The protocol was later revised to address many of these problems:


The requests were versioned, now requests look like GET /foo.html HTTP/1.1
Headers were added for meta information with both the request and response
Connections were allowed to be reused with Connection: keep-alive
Chunked responses were introduced to allow connections to be reused even when the document size is not known ahead of time.
Gzip compression was added


At this point HTTP has been taken about as far is can without breaking backwards compatibility.

You are not the first person to suggest that a page and all its resources should be pushed to the client. In fact, Google designed a protocol that can do so called SPDY.

Today both Chrome and Firefox can use SPDY instead of HTTP to servers that support it. From the SPDY website, its main features compared to HTTP are:



SPDY allows client and server to compress request and response headers, which cuts down on bandwidth usage when the similar headers (e.g. cookies) are sent over and over for multiple requests.
SPDY allows multiple, simultaneously multiplexed requests over a single connection, saving on round trips between client and server, and preventing low-priority resources from blocking higher-priority requests.
SPDY allows the server to actively push resources to the client that it knows the client will need (e.g. JavaScript and CSS files) without waiting for the client to request them, allowing the server to make efficient use of unutilized bandwidth.



If you want to serve your website with SPDY to browsers that support it, you can do so. For example Apache has mod_spdy.

SPDY has become the basis for HTTP version 2 with server push technology.

10% popularity Vote Up Vote Down


 

@Kristi941

Because they do not know what those resources are. The assets a web page requires are coded into the HTML. Only after a parser determines what those assets are can the y be requested by the user-agent.

Additionally, once those assets are known, they need to be served individually so the proper headers (i.e. content-type) can be served so the user-agent knows how to handle it.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme