Search the web
Sign In
New User? Sign Up
rest-discuss · REST Discussion Mailing List
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Rediscovering Web Architecture from first principles   Message List  
Reply | Forward Message #2540 of 14031 |
= Addressing =

The Web is first and foremost a publishing platform. It is used to
publish information and services.

We will start with the question of the best way to publish information.
There are two basic strategies possible: one is to describe uniquely for
each information object how to fetch it. Another is to standardize the
fetching mechanism and instead specify only the minimal information
required to actually do the fetch: an "address".

The older FTP protocol uses the first model. When you wanted someone to
download something through FTP you would give them instructions about
how to log into an FTP site, switch to the appropriate directory and get
the appropriate file. Conversely, HTTP leaves only one free parameter:
the address. Everything else falls out of that. You do not need to
choose the method that is used: it is always GET. You do not need to
decide what server to issue the command to: the address of the
recommended server is embedded in the address ("Uniform Resource
identifier", or "URI").

This leads us to the Web principle that all resources should be
identifiable by unique address so that they can be fetched with only GET
and the address.

= Uniform Interface =

Standardizing the means of information download is important for
maximizing the number of applications that can download information
without necessarily understanding the semantics of the information they
are downloading. Examples include caching and prefetching
proxies, command line downloading tools, web browsers, spiders and XSLT
engines.

Any architecture that has multiple ways of saying "get me some
information" will experience interoperability problems that the Web
avoids by having a few, globally optimized ways of doing so (basically
HTTP and FTP GET methods).

Because HTTP's GET is used by such a wide variety of tools, it is
necessary that it have very clear semantics. For instance, a GET cannot
be interpreted as signing a user up for a service or a prefetching cache
or spider could sign the user up just by doing its job. Therefore the
Web says that end-users are not responsible for any side effects of GET.
Services should not use GET in ways that cause side effects.

Let me stress a point made earlier: any architecture that allows every
end-node to choose its own mechanism for delivering information (SOAP
RPC, CORBA, ...) is strictly less powerful and less interoperable than
an architecture that has a single, standardized way to deliver
information because it means that every potential consumer of
information resources must be customized to talk to every publisher of
information. This is an interoperability nightmare.

The concensus on this point is sufficiently strong that the W3C has
extended SOAP 1.2 to support HTTP GET so that it would better take
advantage of the features of web architecture.

= Hypermedia =

But simply returning named information is not always enough. Often the
client application or human web surfer will not know exactly what
information they need in advance. Essentially they need to be given a
menu of options so that they may choose the appropriate one. For
instance a flight-purchasing agent might be presented with a list of
flights and might choose the "best" one based on a complex algorithm
that depends upon price, flight length and flight time. In this case, we
need an information object (a flight list) that refers to other
information objects (the flights). A data representation that has
references embedded among data is known in Web terminology as "hypermedia".

Listing things is a wonderful way of helping a client navigate to
information because they know exactly what is available and sometimes
knowing the list is as or more important as having access to any
particular item.

Google is an example of an application that really does not care about
any particular page on your site much, but cares very much about
ensuring that it has a relatively complete list of pages. So hypermedia
is a very important part of web architecture because it allows
discovery. Where possible, information delivered on the Web should be
organized into webs of hyperlinked hypermedia documents.

Because the Web is basically a flat, global address space (modulo
machines behind firewalls etc.), any resource can refer to any other
resource, no matter whether they use the same data representation or
even the same access protocol (e.g. HTTP versus FTP). This promiscuous
connectedness is what makes the Web so great and important!

The great thing about using hypermedia as an organizational model for
information repositories and services is that one process or individual
can direct another process or individual to look at and deal with a
particular resource by address and can choose the appropriate amount of
context. For instance, you can direct a purchase order processing
application to a particular purchase order rather than to a whole
"purchase order application". This is in contrast to dominant web
services methodologies that hide many resources behind a single URI and
therefore makes addressing those individual resources impossible.
Imagine if someone wanted to send you a link to the Wall Street Journal
and required you to always go to the front screen!

= Queries =

Sometimes the information provider has such a large database of
information that it is not practically (or economically) feasible for it
to deliver to the requestor in total or even to segment into many
hypermedia documents. The requestor needs to specify some reasonable
filter. This is a fallback position because it becomes difficult or
impossible for the requestor to enumerate all of the information items
even if this might be useful or important. Nevertheless, reality
requires that sometimes filtering happen on the server side and the Web
allows this through HTTP URI "query parameters".

The beautiful part of query parameters is that they are expressed as
part of the URI so that all of the tools we've built up for downloading
ordinary hyperdocument resources and lists of hyperdocument resources
can also be used on filtered lists of hyperdocument resources. The only
difference is that in this case, the URI is partially constructed by the
client application, under instructions from the server, rather than
being constructed by the server and being totally opaque to the server.
But once the URI is constructed, it is as bonified a URI as any other
one and may be put into any slot expecting a URI. They can be
"promiscuously connected", cached, downloaded with command line tools,
and so forth.

= Representations =

As the Web evolves it becomes increasingly clear that the concepts
addressed by URIs and the bits available at those URIs have distinct
lifecycles and properties. "www.theonion.com" is a news magazine. I
cannot predict what bits you will get if you dereference on the day you
read this essay. In order to discuss this distinction, we need words for
the thing that is addressed and the bits you get by going there on a
particular day, with a particular user agent, etc. The concept is known
as a "resource". The bits are known as a "representation".

Representations have media-types. Resources do not. In-so-far as
resources are "typed" (and HTTP has nothing to say about this) you might
say that they are typed by RDF. The distinction between representation
and resource allows us a very powerful form of extensibility and
adaptability. The Web has a feature called content negotiation which
allows resource consumers to ask for the resource in a variety of
different representations. One representation might be WML optimized for
hand-held computers. Another might be XHTML optimized for browsers.
Others might be PDF optimized for printers and RDF optimized for machine
discovery. As new standards come into existence, they can be served as
new representations for the information, enabling evolution of the
resource without backwards compatibility problems.

This is a good point to emphasize something that has been so-far
implicit. There is nothing in the Web architecture which is specific to
interactions with a human being at one end and a machine at the other.
Existing information resources can be made machine-accessible by adding
an XML (or XML/RDF) representation alongside existing HTML
representations. Human beings are one kind of client for web resources.
Machines are another.

= Services =

So far we have described the Web as an information publishing platform.
But over the years it has also become the world's leading service
publishing platform. There are services for bidding on auctions, booking
flights, generating insults and anything else you might dream of. The
biggest distinction between publishing information and publishing a
service is that the service may require the service provider and client
working together to generate new resources, change existing resources or
delete resources. In other words the conversation is two-way rather than
one-way.

Now before we go into detail on services I want to point out that an
important sub-task in publishing almost any service is publishing
information either held by or relating to that service. There is a
tendency to forget this when developing services using technologies such
as SOAP and XML-RPC. For instance a part of a stock purchasing service
might include a means to get the stock price. The traditional
SOAP/XML-RPC way to do this is to invent a new method called
getStockPrice. I have already discussed the interoperability limitations
of this model. So the first thing to remember about publishing services
is that whatever you do, do not neglect the information publishing part
of your service. If it can benefit from the web architecture features
described above, use them.

The second important thing to remember about publishing services is that
publishing information is crucial for the flexibility, reliability and
scalability of your service. Let's deal with each of these in turn.
Imagine a service that allows the client and server to work together to
generate a purchase order. Now they need to decide what to do with the
generated purchase order. One option is that each of them can give the
order a unique identifier (e.g. "PO number") and maintain a local copy
of it. This arrangement makes it difficult to bring third parties into
the conversation and to utilize URI-aware tools like spiders, caches,
inferencers and XSLT transformation engines. It would be better to give
the purchase order its own URI. Generally, if there is information that
is of interest to both or all participants in a conversation, that
information should be given a URI so that new participants can be easily
brought in after the fact.

Another aspect of flexibility is allowing a variety of different kinds
of clients. As long as the purchase order has a URI, the client can
decide how stateful or stateless it should be. If it wants to keep a
copy of the purchase order it can (more robust and paranoid clients
will). But if it wants to let the server manage it, it can merely keep
the URI and refresh from the server when it needs information. In a case
where the server is allowed to unilaterally change the information
resource (i.e. not a purchase order), the client can always get the
latest version of the resource using a GET.

Emphasizing the information publishing portion of your service is also
important for reliability reasons. Continuing with the purchase order
example, consider what would happen if the client party missed a message
or had its state corrupted during the communication. As long as all
relevant information has been exposed on the server as URIs, it could
rebuild its state with nothing more than the URI representing the
resource. Conversely, in situations where the state of the conversation
is implicit, one lost message can throw the client irreconcilably out of
sync with the server. Of course if the service consumer somehow loses
the URI for the thing it is talking about (e.g. the purchase order URI)
then you are in trouble. But even then you can use discovery and query
techniques to re-establish contact. For this and other reasons, resource
discovery should always be an important part of service design.

Finally there is the issue of scalability. It is often the case that
once an information resource has been created, a representation of it is
retrieved many times. Even a purchase order may be read over and over
again by internal and external auditors and order fulfillment systems.
Using standard web techniques, these representations can be cached.

So far, I've tried to show how information publishing is a crucial part
of all service publishing projects and thus to show that the service
publishing problem is an _extension_ of the information publishing
problem, not a different problem altogether. Next we will describe how
we can extend the web architecture into service publishing.

= Resource construction and mutation =

Consider a service like buying sneakers. There are three reasons that we
need to move beyond the GET-based web we have described. First, we will
want to create purchase orders, so we need a way to create objects.
Second, we need information to flow from the customer to the service
provider rather than vice versa. Third, we need to allow the "side
effect" of actually shipping the shoes.

The HTTP method designed for creating and mutating objects, with
possible side effects, is POST. POST is neither more nor less powerful
than GET. It is just different. GET's safety (side-effect-freeness)
means that clients have extreme flexibility in structuring applications
that rely heavily on GET. In particular, GET allows very "declarative"
applications that say what needs to be done but does not provide any
instructions on how to do it. For instance if an XSLT stylesheet needs
to deal with an XML element inside of a web resource it can retrieve
that document once, or ten times, or a hundred times, at its own
discretion. It might choose to do it once to optimize for bandwidth or a
hundred times to optimize for client-side memory space. If the XSLT is
using multiple documents, it can also choose the order that it retreives
them at its own discretion.

But if you do need side-effects then you need to give up some of GETs
advantages. Then you need POST. Because POST invocations may have side
effects, you must be very careful about the order in which you invoke
POST methods. You need to add the shoes to the shopping cart before you
checkout of the online store and not vice versa.

POST has another related strength/weakness. The input to GET is very
simple: basically just an address. The Web infrastructure strongly
encourages you to move retrievable information into an addressable URI.
But when the service provider and client are working together to create
new resources or modify existing ones, they both need to contribute
information. This requires a higher level of coordination which makes
POST-based integration more difficult than GET-based integration. But
this is the price of solving the more difficult problem of building
information resources rather than simply delivering them.

POST and GET work together in important ways. Using GET-based navigation
you can find the service you want to invoke. It could even report its
quality of service characteristics, terms and conditions and so forth.
Then you use POST to invoke it. This will usually either mutate or
create a resource. This resource has a URI that you can use GET to
retrieve whenever you need it. You can also refer third parties at the
resource via its URI and they can GET it. There is no need to coordinate
who GETs first or how many times you GET because GET is safe and idempotent.

POST can handle any operation which changes client-side state in a
manner that would be inappropriate for GET. But there are two operations
that have pretty clear semantics that can be separated out from the mass
of POST-based actions. Sometimes you have a URI and you want to
overwrite its content. For instance you load a document into your word
processor, make a few changes and want to save it back. Or you are
maintaining a stock quote service and it is your job to udpdate the
quotes as the most recently quoted price change. PUT allows these. The
other operation is DELETE for destroying resources (i.e. making them
into 404s).







Thu Sep 12, 2002 6:06 pm

papresco
Offline Offline
Send Email Send Email

Forward
Message #2540 of 14031 |
Expand Messages Author Sort by Date

= Addressing = The Web is first and foremost a publishing platform. It is used to publish information and services. We will start with the question of the best...
Paul Prescod
papresco
Offline Send Email
Sep 13, 2002
7:22 pm

... bravo! i'm glad i stopped playing wolf to check my email now. it's time to curl up by the fire and do some RESTful meditation. i especially like the last...
Vincent D Murphy
johnfoobar1
Offline Send Email
Sep 13, 2002
9:43 pm

First, excellent summary. There wasn't a single thing I found to disagree with. :~) Second, you indicate something towards the end of the post that, if I ...
Seairth Jacobs
seairthjacobs
Offline Send Email
Sep 14, 2002
3:45 am

Seairth, Yeah ,I definitely think you should use POST to create resources and never PUT (though the RFC says differently). There are many reasons why you want...
inthedarkplace
Offline Send Email
Sep 14, 2002
6:25 am

... From: "inthedarkplace" <inthedarkplace@...> ... Why? If I only ever POST to uri1 to update the resource, isn't that the same as PUT to uri1, only...
S. Mike Dierken
mdierken
Offline Send Email
Sep 14, 2002
6:55 am

From: "inthedarkplace" <inthedarkplace@...> ... Umm. I thought the difference between a 200 response for updating of an existing resource and a 201...
Seairth Jacobs
seairthjacobs
Offline Send Email
Sep 14, 2002
5:28 pm

Well, I will join the general chorus of praise for this fine piece. There is one point that bothers me, though: ... Must they be 404s instead of, say, 302s? To...
Matt Gushee
mcgushee
Offline Send Email
Sep 14, 2002
9:26 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help