henrysthompson scripsit:
> Understood. I guess what I am thinking about is not shipping some
> events straightaway, but stacking them for a bit until the right move
> is clear -- sort of a 'shift-ship' processor, c.f. 'shift-reduce'. . .
I stated the constraint badly: I can and do postpone SAX events,
but not character events, since they are unbounded in size. They
at least must be released to the application in real time.
Consequently, a start-tag can at most be postponed until the next
characters event, and in practice there is not much point in doing
even that.
HTML5, however, exposes the varying state of the tree throughout;
an application may examine a new element's parent pointer at one point,
find one value, and then examine it after new events have been processed
and find a different value. The very nature of SAX events prohibits
that degree of flexibility.
> I think that's at least a partial 'no' to my question. Contrast the
> tokeniser with the remedier, if I can call it that. The tokeniser is
> table-driven, and _also_ available to modify in the form of a
> finite-state transducer. The remedier is not available in quite the
> same way. . . I think I'll have a look at making it more so, along
> the lines of the 'shift-ship' idea -- anyone else interested?
I think the distinction is that you have a theory for tokenizers,
although mine is not quite a strict one -- there are some higher-level
interferences from the parser.
There is no theory for the rectifier yet, though there is a high-level
description of what it does in the TagSoup presentation at
http://tagsoup.info/tagsoup.odp (also .ppt and .pdf). Nevertheless,
the description of HTML is a declarative one.
> By which you mean, add some more decoration to keep <table> from
> auto-closing?
It wouldn't be specific to table elements -- nothing in the rectifier
knows anything at all about particular HTML elements. Rather, it's
a matter of when to check the closeMode attribute in the TSSL.
> Curious to know if the above-mooted change provokes any (bad)
> regressions therein. . .
So am I. The difficulty is that without a standard to go by, every
discrepancy has to be examined by hand to see if it is a regression
or a "progression".
--
There is / One art John Cowan <cowan@...>
No more / No less http://www.ccil.org/~cowan
To do / All things
With art- / Lessness -- Piet Hein