It seems to me that there are three distinct kinds of jobs that have
been smushed together into XML. Sometimes such merging of
functionality results in synergies, as when PL/1 smushed together
features from Fortran, Cobol, Algol, and Lisp, creating a mess. But
a mess suggesting potential synergies between these elements that
inspired many clean descendents like C and Pascal. In particular,
combining heap-allocated pointer structures from Lisp with
struct/record concept from (believe it or not) Cobol was very
powerful, and was one of the steps to objects.
The best we can hope for from XML at this point is for it to become
the PL/1 of textual data representation. By merging these functions,
perhaps someone will be inspired by some synergy I don't see and
create something both new and valuable. Frankly, I doubt it. I
think XML is simply an irredeemable incoherent mess, and the three
distinct jobs remain better done separately by the three distinct
tools that have traditionally done these jobs. Though, I think, XML
can suggest some enhacements these tools. The three tools?
1) Attributed text. This is the job traditionally associated with a
number of text formats, including FrameMaker's, rtf, and others.
HTML has clearly taken over this world, and if one wants to be
compatible with something, HTML, not XML, is clearly the huge
installed base to be compatible with.
2) S-Expressions. As John McCarthy (creator of Lisp) says "XML is
just S-Expressions, only ten times as verbose". (And, I'd add, about
one hundred times as complex.) As I've said on this list before, if
you need to say you're "XML compatible", and there are many marketing
reasons to do this, Minimal-XML is exciting *because* it removes all
the extraneous crap from XML, leaving just the S-Expressions and the
compatibility. The one cool thing XML does add to traditional
S-Expressions (that Minimal-XML, wisely at this point, leaves out) is
the notion of grammars over S-Expressions. But S-Expressions in
ANTLR http://www.antlr.org/ do this tree grammar thing much better
than XML does, and does it over actual Lisp-like S-Expressions. For
those that don't need XML compatibility, I recommend ANTLR.
3) Object serialization. Both Java and CORBA have created well known
binary serialization formats. Java's is unnecessarily complex, and
has the problem that it's perceived to be language specific (it's
not). CORBA's understandably is crippled, being part of CORBA.
Worse, both are defined only as binary formats. Key to XML's
marketing success is that it's a textual format, and one can
therefore use text in books as an example of the encoding. The world
needs a good language neutral flexible abstract object serialization
format with two concrete syntaxes: an efficient binary one, and a
readadble/editable textual one. Unlike XML, XML/SOAP, or YAML, it
should represent arbitrary graphs straightforwardly. There should be
a full fidelity converter in each direction between these formats. I
suspect such serialization systems already exist, and that some are
good, but none are yet widely known. In today's world, where XML
compatibility is such a crushing issue, I doubt any will become
widely known. But perhaps.