Mobile app version of vmapp.org
Login or Join
Nimeshi995

: When do browsers clear cache and how can I make more use of 304 responses? I have a website with some some content which changes frequently. I would like to benefit from good caching when

@Nimeshi995

Posted in: #Browsers #Cache #CacheControl #HttpHeaders #Performance

I have a website with some some content which changes frequently.

I would like to benefit from good caching when this doesn't change (so would ideally use a longer Cache-Control header), but would also like to ensure the latest information is used (so ideally would use a shorter Cache-Control header). These obviously conflict.

There are two benefits to caching data:


Reduce any call at all and use version straight from cache (if resource is used within cache control header timeframe).
Use 304 - Not Modified in response to a request for expired content which it turns out is still the latest version, and which is still in the browser cache (based on ETag or Last-Modified header in request). This still means you need to request the data and receive a 304 response meaning a round trip and latency issues (the performance hit of which is not to be underestimated!), so not as good as first option, but does at least save on the full download and ensures you have latest version.


So my questions are:

When an HTML page or content (CSS, JavaScript, images, fonts... etc.) is expired past it's cache control header time, how long do browsers keep the content around to make use of 304 responses?

Is there anything I can do at server or application level to ensure data is available for 304 responses for a longer time? I.e. Is it possible to have data cached in the browser for a long time (say 30 days) but re-requested within a much shorter time (say after 3 hours) to benefit from 304 responses more? I don't think there are any "keep in cache" HTTP headers, other than the Cache-Control header which I can only use for the shorter time, so am guessing this is purely down to Browser implementation? I'm assuming browsers periodically clear out their cache of expired data to keep disk usage down and asking if I have any influence over which of my website resources I'm happy to clear out more aggressively and which I'd prefer to advise the browser to keep round for longer on the hope that a 304 can be used later.

P.S. I'm aware you can add a timestamp or other unique code to the filename, and long cache-control header, and update this unique part on file change to achieve what I want. This means that, to the browser, it looks like a new resource and hence is fetched. However, for various reasons, this is not always the easiest to implement (and can't be done on the default "index.html" page without server side redirects).

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Nimeshi995

3 Comments

Sorted by latest first Latest Oldest Best

 

@Martha676

Although an answer has been accepted here, the advice given is (IMHO) not very good.

The RFC is very clear about what a cache should do with stale content (its not the job of the browser to cache content, the cache is considered a seperate component). It should not serve up content which is considered stale.

Except for very large files, returning a 304 header without new caching information is no faster than returning new content. If the content appears multiple times on a page, the resulting load time can be much slower than loading from an empty cache! Unfortunately its difficult to configure most webservers to include new caching instructions on a 304 response.

There are multiple solutions to the problem.


can't be done on the default "index.html" page without server side redirects


No, you can't change the URL, but if your caching policy is based on access time then you can update the file timestamp. Alternatively just remove the if-none-match and if-modified-since request headers. But for the reasons above, you'll get little benefit.

For videos, pdfs and sitemap.xml.gz allowing 304 responses is probably a good thing - use expiresByType or LoactionMatch or a directory specific config if its in a seperate dir.

10% popularity Vote Up Vote Down


 

@Nickens628

When you deal with static pages (such as files with the html extensions), header configuration can only be applied to server configuration files, and depending on the server, the configuration options might be limited.

For more flexibility without messing up the server, use a server-side scripting language such as PHP.

Also, for cache-control, you want to use the "must-revalidate" option so that when the cache expires, then the browser will recheck the server for the resource needed.

As for the 304, when the browser requests the resource again, it will issue a value to a server environment variable if-modified-since or if-none-match depending on whether you use the eTags or last-modified header. The server (or script executed by the server) then issues a 304 if it believes the variable contains the required data. Search for if-modified-since and if-none-match on google, and there will be more details on how those work.

10% popularity Vote Up Vote Down


 

@Mendez628

When an HTML page or content (CSS, JavaScript, images, fonts... etc.)
is expired past it's cache control header time, how long do browsers
keep the content around to make use of 304 responses?


While browsers may keep the content stored on disk in a temporary or cache directory beyond the expiry date/time, the browser will consider it out-of-date (not 'fresh') and will still refer to it and revalidate it before attempting to re-download the file. Unfortunately the specification does not cover how browsers should deal with expires files, and the actual clean-up of the files on disk will depend on the browser and user preferences (such as a cached files diskspace limit) and can vary but could be when the browser is closed.

Mark Amery has written an excellent answer (Browser caching - prevent request that returns 304) which while addressing a different question, explains clearly and accurately the relationship of HTTP 304 responses and browser caching behaviour.


Is there anything I can do at server or application level to ensure
data is available for 304 responses for a longer time? I.e. Is it
possible to have data cached in the browser for a long time (say 30
days) but re-requested within a much shorter time (say after 3 hours)
to benefit from 304 responses more? I don't think there are any "keep
in cache" HTTP headers, other than the Cache-Control header which I
can only use for the shorter time, so am guessing this is purely down
to Browser implementation?


I would suggest simply specifying a Cache-Control: public, max-age=1080, must-revalidate (18 minutes, from your comment) and the browser will certainly revalidate the content after that time, but may also at its own discretion revalidate during this time or when a user hits refresh to reload a page.

If you really wanted you could also try using ETag matching with an MD5 hash rather simply than the checking the last modified date/time to ensure files are not re-downloaded unless the content of the file has changed. There can be a slight performance hit calculating ETag's for each request if you get a lot of traffic, and many people oppose this method either for the slight performance degradation, and because many people don't configure it to use MD5 (you'll have to use a server-side script for this) it would offer no real benefit over the last modified date/time check and could behave erraticly when files are distributed from multiple servers. Having said that, configured correctly, ETag comparison combined with long expiry times can help ensure less downloads of files is required. To ensure you're not poorly affected by performance, before setting up any ETags make sure you webserver configuration is setup to use persistent connections so that requests can be chained together on one connection very quickly rather than requiring multiple connections since all modern browsers will utilise this if your server is setup for it.


I'm assuming browsers periodically clear out their cache of expired
data to keep disk usage down and asking if I have any influence over
which of my website resources I'm happy to clear out more aggressively
and which I'd prefer to advise the browser to keep round for longer on
the hope that a 304 can be used later.


You can indeed specify a different cache time for different resources on your website. For example if you have a sub-folder (for example, lib) in which you store third party libraries/code such as jQuery, requireJS or FontAwesome for example, then you could include the version in the folder name (e.g. www.example.com/lib/fontawesome-4.3.0/) and then set the maximum or at least a very long cache time for everything within this lib folder as you will not be making modifications directly to third party code/files, and future versions would be introduced under new folders with the version name included. Static resources (.js,.css,.jpg,.png etc) that are used for your website template that may change from time to time could have a shorter cache time set.

# Cache third party files for 1 year (31536000 seconds)
<Directory "/var/www/public_html/lib">
<IfModule mod_expires.c>
Header set Cache-control max-age=31536000
</IfModule>
</Directory>

# Cache template files for 1 day (86400 seconds)
<Directory "/var/www/public_html/tpl">
<IfModule mod_expires.c>
Header set Cache-control max-age=86400
</IfModule>
</Directory>



P.S. I'm aware you can add a timestamp or other unique code to the
filename, and long cache-control header, and update this unique part
on file change to achieve what I want. This means that, to the
browser, it looks like a new resource and hence is fetched. However,
for various reasons, this is not always the easiest to implement (and
can't be done on the default "index.html" page without server side
redirects).


You're absolutely right, these are not easy to implement in an effective and efficient manner and requires a lot more effort/code than using the HTTP protocol headers as they were intended. The only scenario I know of where this kind of solution is sometimes preferred is for cross platform software such as content management systems where they wish to consistently support all the various webserver environments. For example, if their software was written in PHP, they might implement a solution that can be achieved entirely in PHP rather than utilising .htaccess rules.



Added on 04-Jun-2015:

Regarding the browser behaviour of expired ('stale') cached files:


"A stored response is considered "fresh", as defined in Section 4.2, if
the response can be reused without "validation" (checking with the
origin server to see if the cached response remains valid for this
request). A fresh response can therefore reduce both latency and
network overhead each time it is reused. When a cached response is not
fresh, it might still be reusable if it can be freshened by validation
(Section 4.3) or if the origin is unavailable (Section 4.2.4)."

Source: RFC 7234: HTTP Caching, last paragraph of Introduction.


Regarding the behaviour of the must-revalidate directive:


"The "must-revalidate" response directive indicates that once it has become stale, a cache MUST NOT use the response to satisfy subsequent requests without successful validation on the origin server."

Source: RFC 7234: HTTP Caching, 5.2.2.1. must-revalidate.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme