--- In exceptional-performance@yahoogroups.com, "Chris Korhonen"
<ckorhonen@...> wrote:
>
> In larger organizations it is often still the case where the web
developers
> do not have access to the production servers, or to their
configuration -
> you'll find that the actual hardware is managed by another team, or
even
> outsourced to another company, with possibly draconian procedures in
place
> for making such a low-level change.
Here's what I've found, working for a large site:
1) There is a constant tension between the content owners wanting to
see their updates instantly take effect on the live site vs. the
operations folks wanting a reasonable caching rule in place. What you
usually end up with is a compromise where the static content is usually
cached for no more than 5-15 minutes on the off chance that the content
owner updates it.
2) Since static content may be served from the company's servers or
CDN1 or CDNx at any given hour depending on bandwidth commitments and
current prices, the effective caching directives may go from 15 minutes
if served by the company directly, to a doubled 30 when served from a
CDN. This means that caching directives need to be halved on the off
chance that the content will be served from a CDN.
Combine one and two, and you get caching rules that are one quarter as
efficient as they could be.
What we are now in the process of implementing is a strategy where we
identify key files and directories that have a global impact on our
site (the low-hanging fruit), and caching them with a far-futures
expires header. In order to invalidate the end user's browser cache,
and the cache on the CDN whenever the file gets updated, we literally
change the URI for the resource.
On the server, we configure the Web servers to strip the root directory
out of the request if it matches a certain pattern (say, ver-YYYYMMDD-
sequence) and serve the physical resource without that directory. If
the request is for http://www.foo.com/ver-20081128-0/fff.gif, the
server will send the contents of http://www.foo.com/fff.gif (without
redirecting).
On the pages, we maintain a list of the last updated dates and sequence
# for that date for all of the files that we care about, and make that
list available to the code that builds the pages.
In the content for the pages, we abstract out the URL to the files to
include the information from that list.
That way, a file can get updated and published live without anyone
seeing it. Then the list gets updated and distributed. The pages then
pick up the information from the new list and insert the updated
date/sequence when the page is served, thus serving a "new" file
instead of the old one.
So far it seems to be working perfectly as long as all references to
that file have had the URL abstracted and as long as the list gets
distributed.