Search the web
Sign In
New User? Sign Up
rest-discuss · REST Discussion Mailing List
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want your group to be featured on the Yahoo! Groups website? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.

Messages

  Messages Help
Advanced
REST/SOAP ideas summary   Message List  
Reply Message #1301 of 14748 |
(very rough)
== Roots of the REST/SOAP debate ==

There is one thing we can all agree upon: the REST vs. SOAP debate is
complicated. Debates about architecture are invariably complicated
because they inevitably revolve around abstract models as opposed
to syntactic details. In many cases different people's mental models
have developed differently through their years of experience.

This paper is intended to help readers to understand the point of
view of a REST advocate at a high level.

REST stands for REpresentational State Transfer, which is Roy
Fielding's name for the architecture of the current Web. If the Web
had had a design document in advance, it would have outlined the
principles of REST. But The Web did not have a design document and
so many people think it merely accumulated. Nevertheless, the design
of the Web can be determined by analyzing the major documents that
describe its protocols and formats: URIs (RFC XXXX), HTTP (RFC XXXX),
HTML (RFC XXXX) and XML (RFC XXXX).

The term REST has evolved (for better or worse) from what Roy
Fielding used it to mean. He was mostly focused on discussing the
virtues and vices of HTTP. What most people mean when they say that
they support REST is that they want Web services to integrate deeply
into the architecture of the Web and use Web technologies to their
fullest. REST only became a rallying cry when it became clear that
an alternative model was arising and that this alternative was in
some senses competitive with the Web.

== In the Beginning ==


There are four main concepts that serve as the characters in this
story. You are probably familiar with them but I need very precise
definitions so bear with me. Our protagonists are documents, data,
programs and protocols. Historically a document was a bag of bits
intended for viewing by a human being. Documents were written by
people for people. Because people are creative, documents vary
radically in size and shape. Data is more often "collected" rather
than "created". For easy maintenance it is often collected in very
constrained ways and stored in highly organized containers: databases.
Programs are sets of instructions written in some language, usually a
programming language. Protocols are ways for programs to communicate
with each other across computer networks.

Without programs there would be no computing so in that sense programs
are wonderful. But in another sense they are a necessary evil.
In particular, programs are extremely hard to analyze. That is why
there are so many bugs in them! Computers can do only very rough
validity checks on them. Programs are also quite hard to use in new
contexts that they were not prepared for. A Macintosh program cannot
(in general) run on a PC. This is because it is hard to analyze the
program and understand what parts of it depend on what quirks of
the Macintosh platform. If programmers need to write code that runs
on both Macintoshes and PCs, they tend to code in a very particular
way from the beginning (choosing their tools carefully and testing as
they go). In essence they do the required analysis a little bit at a
time as they go rather than expecting to be able to automate it later.

To put it another way: if somebody drops 100MB of data into your lap
and asks you to do something useful with it, there are a variety of
analysis tools and techniques you can use to determine its nature and
figure out how to re-use it. If they drop the same 100MB of program
code into your lap you will find that the analysis is in general much
more difficult because analyszing code is intrinsically difficult.

Around the late 1960s, some IBM researchers noticed that existing
technologies did not make a clear boundary between programs
and documents. Some forms of documents were really just simple
programs. Some were actually very complex programs! One virtue
of modern computers is that they can treat code as data and data
as code but usually it makes more sense to treat them as separate
entities. The researchers noticed that if you treat documents as data,
rather than programs, you can much more easily repurpose them for
new environments. They sought to clearly delineate the line between
document data and the code that processed it.

Let me give a concrete example. Suppose you work in a highly secure
environment. Someone sends you a Word document and a Docbook/XML
version. Word mixes code and data. Docbook/XML does not. You load the
one up into Word and the other up into an XML editor. There is a chance
that the first one will cause Word to say something along the lines
of: "I'm sorry. There seems to be a macro in this program. Macros are
program code. Analyzing program code is extremely difficult. Therefore
I can't tell you whether this document is safe or unsafe. Open it at
your own risk." The Docbook version will never give such a warning
because Docbook separates document data from code so that there is
no need for such a security audit. Similarly, Word macros tend not
to work properly in Word-like programs that claim to work with Word
documents. Docbook documents are entirely platform independent.


== SGML Philosophy ==


Docbook was orginally based upon SGML. SGML was the language that came
out of that research at IBM. Docbook's easy environment-independence
comes from using the two markup languages and keeping in mind the
philosophies that surround them.


People who grew up with SGML and XML came to internalize certain
philosophies. One was that separation of data and processing is
sometimes inconvenient in the short term but generally pays dividends
in the long term. Once you've separated the data from its processing
you can often find new uses for the data that you did not originally
expect. Because it is easy to analyze and reuse data, not programs,
SGML people tend to strongly prefer systems that encode as much of
the logic as possible in data rather than programs.

Average programmers, not surprisingly, feel uncomfortable with this.
If programming is what you get paid for, why would you want to
decrease the importance of programs? Moving logic from program code
to data is also a hard sell for the same reason that jogging is a
hard sell. It often (but not always) involves short-term pain for
long-term gain. Furthermore, the nature of the long-term gain is
typically unforseeable. Any well-defined, present requirement can be
coded into a program. It is the ill-defined, future requirement that
the program will probably have problems with. Because data is easier
to analyze, it is easier to repurpose it to solve new problems easier.

Most people do not understand that the Web was designed precisely
in opposition to the idea that data should be accessed through
programmatic methods rather than being treated as intrinsically
inert.

http://www.w3.org/DesignIssues/Principles.html#PLP

The most advanced techniques for moving logic from code to data are
called "knowledge technologies." RDF is the knowledge technology from
the W3C. Topic maps are from ISO. These knowledge technologies allow
the classification of information items, the association of metadata
with the items and allow relationships betwen items to be expressed
explicitly in some cases and inferenced (inferred) in other cases.
Inferencing is a way of uniting partially understood information
sources to create new knowledge.

People vary in their level of skepticism about the applicability
of these knowledge technologies. In my mind, that debate is
marginal. Sophisticated developers separate data from processing as
much as possible, whether or not the data conforms to some particular
standard. The knowledge technology standards are just points far
along a spectrum of declarative technologies. Every hyperlink in the
world is a declarative assertion of a relationship between two data
objects. If you understand the benefit of hyperlinks then you
understand what declarative techniques can do for you.

== The SGML Tribe ==

Members of the SGML tribe also came to deeply revere the related
concepts of linking and addressing. Once again, this comes back to
the idea of reusing information in ways that it was not originally
intended for. If you can get some kind of handle or reference to
an information item, you can reuse it in new forms of information
packaging. For instance, it is possible to link a range of Excel cells
into a Word document. As you change the cells in Excel, the changes
are reflected in Word. You don't have to create the original Excel
spreadsheet with some special flag that says: "I want to reuse this
information." You just tell Word (through drag and drop) that you
are interested in cells A3-D5 in "myfile.xls".


SGML people took this technique to new heights and created extremely
sophisticated systems built upon it. The most sophisticated of these
systems is the World Wide Web, built in part by people who were
part of the SGML tribe (like Dave Raggett), in part by people who
had discovered or reinvented its core ideas without joining the tribe
(like Tim Berners-Lee) and also by people who simply did not understand
it at all (like Marc Andreesen and is cronies, who I will probably
never forgive for lamentably poor extensions to HTML and the Web).


== The Google Example ==

Let us consider a particular example. One way to do hyperlinking is
with declarative statements about what pages are linked to other
pages. Another way is to put little programs in each element that
invoke the hyperlinking behaviour.

A programmatic approach to hyperlinking would encourage you to
represent the relationships between pages as little snippets of code
that load and display the next page. At first glance this would seem
to be a more sophisticated system because it would be possible to write
sophisticated hyperlinks that do complicated computations and then
choose the appropriate page to load based on those computations. The
code could also do interesting things with the placement and styling
of the pages.


http://www.w3.org/DesignIssues/Evolution.html#Least

Unfortunately, code is difficult to analyze. This means that once
code is deployed the user typically has only a boolean choice:
"run it" or "don't run it."

On the other hand, consider one result of our declarative approach
to hyperlinking: Google. Google is built upon the analysis of Web
hyperlinks. If hyperlinks were code rather than data, Google could
not come along ten years after the invention of the Web and figure
out a completely new way for analyzing and reusing the deata embedded
in it. Nobody foresaw Google's ranking algorithm when the Web was
invented. In fact the idea of a central search engine would probably
have seemed insane. But years later, the people at Google noticed
all of the Web data sitting around and they found a new way to make
it valuable. Yahoo has a similar story.

Another interesting thing about Google is that it works despite the
fact that it does not always know exactly what it is doing. Google
does not know when it sees a link of whether it is a link from a
child document to a parent document or vice versal. It does not
know whether the two documents are created by the same person. Google
does not know why the documents are linked. Google sees a link and it
records it. It works from partial understanding rather than waiting
around for complete information.

If Google could only get information out of a service by understanding
every detail about the service's interface, it would not be able to work
with partial understanding and will be consequently crippled (later
on I present an analogy with the phone system that makes this clearer).

http://www.w3.org/DesignIssues/Evolution.html#PartialUnderstanding

Web-based services such as Google, Yahoo, Blogger and Meerkat add value
to existing information by making links to that information. They
show what you can do by using XML and hyperlinks to create services
that work with partial information based on analysis of data.


== SOAP's Role ==

The programming world spent years trying to avoid using a data-centric
model for networking. They used protocols such as DCOM and CORBA which
hid the data behind programmatic interfaces (APIs) and delivered the
data across the network in unreadable binary packets. These are called
Remote Procedure Call (RPC) protocols. One camp of people thought
that there were only two things wrong with this strategy. First,
they disliked that there were BOTH DCOM and CORBA. They logically
felt that there should really be a single standard for RPC over the
Internet. Also, they thought that the binary packets were turning
off Internet programmers, who are more comfortable with text-based
formats. This group invented what we will call "SOAP-RPC", which was
an XML-based remote procedure call protocol for use over the Web.

Another camp thought that this was not enough to allow messages to flow
freely between businesses. They thought that an important requirement
was "loose coupling". In other words they wanted to make it possible
for clients and servers to evolve independently. XML allows this but
does not require it. It takes extra effort to create systems that are
loosely coupled. Over the last several years, people espousing this
view have taken over the job of directing the SOAP specification. Most
of the more clued-in architects at vendor companies agree. They see
the RPC model as backwards.

http://www.prescod.net/soap/views

Neither camp incorporated lessons from the SGML and XML tradition. In
particular they did not design SOAP such that individual data objects
would be addressable. For instance, if you were designing a banking
web service using standard SOAP tools, you would almost certainly
make the bank the only explicitly addressable object. In other words
the only URI would be for the bank. To send a message to a particular
account you would do something like this:

bank = new SOAPProxy("http://.....") bank.addMoneyToAccount(account
23423532, 50 dollars)

To get information about a particular user, you would do something
like this:

bank = new SOAPProxy("http://.....")
bank.getUserNameFromAccount(account 23212343)

Note that the account itself is not addressable. It has no URI. In the
Web-centric version of the service the accounts would be addressable.

Let me offer an analogy. Suppose you were living temporarily in
a hotel. The hotel might not have direct dial connections from the
outside. In order to call a room you have to contact the operator first
(this is like contacting the "SOAP endpoint") and then ask them to
connect you to your room. Now imagine that there is an outside service
that you would like to buy. It is a onc-a-day automated wake-up call
and horoscope service. You try to sign up for the service but when
you are asked to enter the phone number to call back you realize that
there is no single number. The service must contact the operator first
and then the operator must patch them through to you. Obviously the
computer on the other end is not going to be smart enough to know to
go through the operator and the operator will not know to patch the
call through to your room.

If everybody lived in a hotel like that, the operator service would
be practically impossible. A particular application of the phone
sysstem would simply cease to exist. Telemarketers would find their
job a little harder too but I think that they would adapt!

Note that the problem is not obvious in the design of either system. It
is when you try to unite the two systems that you wish that the hotel
had used the international standard phone addressing "syntax" rather
than having an extra level of misdirection through the operator. SOAP
services are the operator. The objects they work with (purchase orders,
bank accounts, personelle records) are the hotel rooms. The data is at
the mercy of the soap endpoint. If the interfaces of the client and
server do not exactly align, the two cannot communicate. And yet,
participants are often happy to work with partial knowledge and
loosely coupled interfaces. For instance, the automated horoscope
system was not interested in the fact that you happened to be in a
hotel room. Similarly, a system for tabulating monthly pay checks would
not care what precise human resources management system the enterprise
was using. As long as the employee records are available in EmployeeML,
it doesn't matter what workflow was used to get them there.

Bear in mind that SOAP's weakness around addressing does not stem
merely from ignoring the lessons of XML and SGML. In fact, older
standards like CORBA and DCOM did much better in this area. Every
object had an address and although the address syntaxes were not
URIs, they were at least standardized within the domain of each RPC
protocol. SOAP lacks any equivalent addressing model. Although it
is the new, new thing, it is actually less sophisticated in this way
than its predecessors.

== SOAP and the Web ==

There is a reason for SOAP's weakness around addressing. If SOAP
unambiguously stated that the addressing syntax for SOAP is URIs then
that would be equivalent to saying that SOAP is designed for use on
the Web and only on the Web. But SOAP advocates are quite clear about
the fact that Web protocols and Web addressing models serve only as
"transports" for SOAP. That is like saying that the Web is a taxi
driver and its only job is to get SOAP from place to place. The SOAP
message can get out of the taxi and into another vehicle for the next
leg of its trip. In technical terms, SOAP "tunnels" through the Web.

This is a core area of digression between the REST viewpoint and
that of the SOAP specificatoin. The REST viewpoint is that the
Web architecture is incredibly scalable and well-designed. The Web
was explicitly designed
to integrate disparate information systems. But the Web did it on its
terms, by binding them all into a single namespace and encouraging them
to use a single protocol. The central virtue of the Web is that you
don't glue system X and system Y together using the Web as transport
middleware. Rather you make a web interface to system X and a web
interface to system Y and the two systems can link to each other and
exchange information with each other just by virtue of the fact that
they are using the same namespaces, protocols and formats.

Yes, the Web is middleware. But like any middleware you would buy from
a high priced vendor, the middleware encourages you to map your data
into a common model. This model is REST, the Web Architecture. Any
middleware vendor will tell you that you want to do this kind
of mapping rather than create N*M connections between individual
services. Even so, some people really want to take the short-cut of
tunnelling through the Web. In some cases that might be useful but it
is highly debatable whether it is the W3C's job to make that easier
rather than concentrating on improving the Web so that tunnelling
is not necessary. I, for one, am in favour of treating the Web as a
translator rather than a taxi driver.


== Interoperability is Key ==

Many of the usage scenarios for SOAP seem to be inddirect
point-to-point integrations of system X and system Y, using the
Web as transport middleware. Considering how many systems there are
out there, this model cannot scale. It quickly becomes obvious that
there will need to be standardized interfaces with many interoperable
implementations, just as there are many interoperable implementations
of HTTP and SMTP. SOAP itself does not guarantee interoperability. It
is protocols built on top of SOAP (probably specified in WSDL) that
will guarantee interoperability.

These standardized interfaces will be the basis of the Web Services
revolution. Some vendors promote SOAP as a way that any half-decent
programmer can generate a new protocol by running a tool over their
Java or C# program. Even the vendors are coming to agree that this
is a naive point of view. First, it is naive because the services so
generated are extremely poor from a distributed computing point of
view. They are as brittle as communion wafers. They are as tightly
coupled as salsa dancers. Try adding an extra parameter on the
Google service!

Second, it is naive because a blossoming of thousands of protocols
does not get us any closer to interoperability. What we need for
interoperability are a few well-engineered, well-designed, scalable,
secure protocols. The success of the existing web and of email shows
that a few protocols can get a ton of work done. When you book a
plane trip through an online travel agent, that agent isn't using the
"Travel Agent Protocol." It uses the generic HTTP protocol.


HTTP is a special protocol in that it explicitly embodies the Web's
principles. In particular HTTP revolves around URIs and uses very
self-describing messages. A radical viewpoint is that HTTP is the
only web services protocol we will need!

Whether or not this is true, there is a sense that the whole SOAP
exercise has been a distraction from the main event. The main event
is determining the data models and data flows necessary to allow
ebusiness to begin to flow. ebXML (whether it turns out to be a
success or failure) is nevertheless attempting to solve the right
problems. The SOAP prject is not.

But SOAP is not just a distraction. It actually works against the
process of getting ebusiness correct by training application developers
to avoid using one of their most powerful tools, the hyperlink. This
is analogous to teaching database designers not to use foreign keys,
C programmers not to use pointers or Java programmers not to use
references.

== Separation of Processing from Data ==

Remember that one of the lessons from the SGML days was that it is
good to separate data and program code? The HTTP protocol is brilliant
at ensuring that data is not dependent on code. For instance imagine
that the department of motor vehicles had a web service that could
deliver information on who owns a particular license plate. I could
integrate the information into an XML document using one line of code:

<xi:include href="http://..../license_handler?V6A4G5">

This will execute an HTTP GET. The important thing is that HTTP defines
a protocol for turning URIs into bit streams that can be incorporated
into other documents.


The web infrastructure guarantees that a client may execute this as
often as they like with no repurcsions because GET operations are
guaranteed to be safe.

On the other hand, SOAP would require me to somehow embed a method
call in my XML document. There is no syntax for doing this and there
cannot be one. One of the Web's principle axioms is that users are not
responsible for any damage that is done by following a hyperlink. SOAP
methods have no way to communicate whether the method does or does
not do anything that could be considered damaging. This shifts the
responsibility to the client software in a manner that is not scalable.


== REST From a Protocols Point of View ==

There is another way to come to the REST position without any interest
in Web architecture. One can look at SOAP purely as a protocol without
even considering the problems it is supposed to solve. It bends or
breaks all of the rules common among Internet protocols. For instance,
if one reads a book on the roles of networking protocols, it is clear
that SOAP is a layer 6 (presentation layer) protocol but SOAP actually
runs on top of HTTP and SMTP, which are both layer 7 protocols. That
makes SOAP a layer 8 protocol. Unfortunately the standard network stack
diagrams max out at 7! By definition the 7th layer, the application
layer, is the top layer. SOAP treats layer 7 protocols as "transport"
protocols. It pretends that they are layer 4 protocols. This causes
various sorts of discomfort for system administrators, security
analysts and protocol purists.

There is a sense that there will be one or two widely publicized SOAP
security breaches and firewall administrators will shut SOAP traffic
down. Although any protocol can be used as the basis for a security
hole, SOAP arguably encourages security holes by encouraging every
business analyst to become a protocol designer. Also, the industry
has not been clear from a marketing perspective that SOAP is just a
basis for application protocols, not a top-level protocol itself. That
means that SOAP could get tarred with the security flaws that appear
in protocols based upon it.

But REST's biggest hole when it comes to protocols is the issue
I discussed of many protocols versus a few. No "many protocols"
framework has ever taken hold on the Internet. Security concerns
are one reason. It is easier to analyze and understand the security
implications of a few standarized protocols rather than hundreds of
unstandardized ones. Another reason is administration. Administrating
a firewall that supports many protocols is harder than maintaining
few. Finally there is the big one, interoperability. Getting all of
these independently created protocols to talk to each other will be
a nightmare.


== HTTP Does More than you Think ==

HTTP is wonderful in that it natively separates data from
processing. But it is also more general than just a document
fetching protocol. Most people do not know that HTTP is 100%
CRUD-compliant. That means that it has methods that map to "create",
"retrieve", "update", "delete" and "replace", just as SQL does. The
power of URIs and of the HTTP methods combine to make HTTP an extremely
general protocol for manipulating information sources. The SQL
analogy should be suggestive of the sort of flexibility available!


== Limits of The Web-As-We-Know-It ==

HTTP is not perfect. The Web is not perfect. We are not at the end
of history. We need new and better stuff. The REST argument is that
we need to understand what we have and make certain we do not lose
features we have today. SOAP makes it extremely difficult to use
hyperlinks because SOAP is fundamentally a technology for tunnelling
through the Web.



Tue May 21, 2002 6:16 am

papresco
Offline Offline
Send Email Send Email

Message #1301 of 14748 |
Expand Messages Author Sort by Date

(very rough) == Roots of the REST/SOAP debate == There is one thing we can all agree upon: the REST vs. SOAP debate is complicated. Debates about architecture...
Paul Prescod
papresco
Offline Send Email
May 21, 2002
6:15 am

Excellent. - Sam Ruby...
Sam Ruby
sa3ruby
Online Now Send Email
May 21, 2002
8:50 am

Interesting, but I have a suggestion. Either scratch the OSI protocol layers discussion or try comparing against the Arpanet/Internet Reference Model instead....
bouncybabbage
Online Now Send Email
May 24, 2002
5:51 am
Advanced

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help