No, I'm not paranoid enough yet. It's not sufficient only to say
that the HTML is encoded as UTF-8 (see below).
David-Sarah Hopwood wrote:
[...]
> The HTML or XHTML document starts with a correct <!DOCTYPE or
> <?xml declaration respectively,
I meant, the document starts with <!DOCTYPE HTML> in the case
of HTML, or <?xml version="1.0"?><!DOCTYPE HTML> in the case of
XHTML.
(This will also put the parser into sane^H^H^H^Hstandards mode.)
> and is encoded as well-formed UTF-8.
The document must also start with a UTF-8 BOM, *and* must not
contain a META directive that changes the charset, *and* in the
case of HTML, must either be retrieved from a local file or over
HTTP with the header "Content-Type: text/html; charset=UTF-8".
This is because the method of determining the encoding is chosen
based on the phase of the moon.
Any other problems?
--
David-Sarah Hopwood ⚥