Yes, it is natural in Water to convert HTML into Water objects.
In fact, you just execute the string because the HTML tags
are defined already.
"<b> hi </b>".<execute/>
Now, this assumes that the HTML is proper XHTML, and this
is rarely the case. To convert HTML to XHTML, call html_to_xhtml
on the string. It will clean up HTML and make it XHTML.
"<b> hi ".<html_to_xhtml/>
result="<b> hi</b>"
You can then execute the string:
"<b> hi ".<html_to_xhtml/>.<execute/>
result=<b 0="hi"/>
Helpful hint:
Sometimes, it is better to grab a substring of
the page using 'key_of' to locate the start
and end of a section (say one table with data).
Then convert that substring using
the above technique.
<Mike>
----- Original Message -----
From: skramer072
To: waterlanguage@yahoogroups.com
Sent: Thursday, November 10, 2005 6:18 AM
Subject: [Water] Parsing html pages
I see a lot of examples of how to convert Water's HTML objects into
strings, but I can't find any examples of how to convert a string of
HTML into hypertext objects.
I am downloading a web page and trying to parse it. I'm using the
content attribute of the <web/> object to get the contents of the site
as a string. It seems to me that it should be natural in Water to get
the entire webpage I am downloading as objects.
Here's how I download the page:
<set x=<web "http://citeseer.ist.psu.edu/Kobayashi00information.html"/> />
I'm getting the content like so:
x.content
------------------------------------------------------------------------------
YAHOO! GROUPS LINKS
a.. Visit your group "waterlanguage" on the web.
b.. To unsubscribe from this group, send an email to:
waterlanguage-unsubscribe@yahoogroups.com
c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
------------------------------------------------------------------------------
[Non-text portions of this message have been removed]