Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 74 - 103 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
74
Hi John, Anyone, Do you happen to have any URI lists or http-able sized collections of soupy HTML? I've been playing with a really crude tagsoup-style parser, ...
Danny Ayers
danny_ayers
Offline Send Email
Jun 15, 2004
11:57 pm
75
... It could be done, but are you sure that's what you want? It would entail, for instance, that a sequence of paragraphs like <p>foobar <p>bazzam <p>quxquux ...
John Cowan
johnwcowan
Online Now Send Email
Jun 17, 2004
2:00 am
76
... In the next release I'll make "script" allowed to appear anywhere, since browsers seem to allow it anywhere. ... In general, yes. -- But you, Wormtongue,...
John Cowan
johnwcowan
Online Now Send Email
Jun 17, 2004
1:46 pm
77
I've looked everywhere and can't find any relevant Java examples of how to use this library... not even a quick 5-liner with a string...?...
richard_hassinger
richard_hass...
Offline Send Email
Jun 17, 2004
3:47 pm
78
... Well, it's a SAX parser: you can learn how to use SAX parsers at http://www.saxproject.org . You can also look at the static main, tidy, and chooseContent...
cowan@...
johnwcowan
Online Now Send Email
Jun 17, 2004
7:32 pm
79
... That's a for-sure bug. Can you send me the document exactly as is, so I can reproduced the problem? Thanks. -- Si hoc legere scis, nimium eruditionis...
John Cowan
johnwcowan
Online Now Send Email
Jun 18, 2004
5:28 am
80
hello, I'm using Tagsoup 0.9.4 to transform HTML in well-formed XML and gets the following wrong results : <h2 align="center" style="margin-top: 0;...
Fabrice Estiévenart
fe@...
Send Email
Jun 21, 2004
2:23 pm
81
(re-)hello, is it possible to disable the HTML tags structuration (parents, inline, blocklevel elements) in Tagsoup ? thanks, Fabrice...
Fabrice Estiévenart
fe@...
Send Email
Jun 21, 2004
3:47 pm
82
... Tell me more about this idea; I don't understand what purpose such an option would have. -- "In my last lifetime, John Cowan I...
John Cowan
johnwcowan
Online Now Send Email
Jun 21, 2004
4:35 pm
83
sorry for the bad expression in my query...i just reformulate it : i'd like to transform bad html in xml by keeping the initial tags structure and without...
Fabrice Estiévenart
fe@...
Send Email
Jun 22, 2004
7:45 am
84
Hi group, I am trying to parse the following: "<td>this is very <b>important</b> stuff.</td>" into: "this is very important stuff." It seems simple...
gzcao
Offline Send Email
Jun 23, 2004
7:13 pm
85
Forgot there is also the startElement event. Duh....
gzcao
Offline Send Email
Jun 23, 2004
7:35 pm
86
I don't know if this is a tagsoup issue or not, but perhaps someone can steer me the right way.... I've got an application where I am feeding soup into...
chris_bitmead
Offline Send Email
Jun 30, 2004
3:18 pm
87
... If JTidy treats "&nbsp;" and U+00A0 differently, then I have to say it's buggy. These are supposed to be exactly equivalent in HTML files. ... It's hard...
John Cowan
johnwcowan
Online Now Send Email
Jun 30, 2004
3:28 pm
88
When I do an octal dump I get 0302, 0240 (i.e. C2, A0 ). Is that what you write as U+00A0 ? Is that the UTF-8 encoding thereof? It would hardly suprise me if...
Chris B.
chris_bitmead
Offline Send Email
Jul 1, 2004
8:36 am
89
... Yes, exactly. ... The output of the TagSoup main program is always UTF-8; you may need to tell JTidy that. ... I'd write a replacement main() method. -- ...
John Cowan
johnwcowan
Online Now Send Email
Jul 1, 2004
11:33 am
90
Hello, Here is HTML snippet I tried to tagsoup: <td width="435"><nobr><a href="/"><img src="http://g.delfi.lv/d/h/news_on.gif" border=0 alt="Ziòas" width=78...
Kristine
k_tc
Offline Send Email
Jul 2, 2004
11:47 am
91
I've put together a simple, fairly forgiving SAX2-style HTML/XML parser in Python, may be of interest here. As a demo there's a simple RSS aggregator. ...
Danny Ayers
danny_ayers
Offline Send Email
Jul 11, 2004
12:15 pm
92
Hi, The Water language currently uses Tidy for converting HTML to XHTML, and I'd like to move to TagSoup because: TagSoup should never fail to return TagSoup...
pluschli
Offline Send Email
Jul 26, 2004
9:45 pm
93
TagSoup 0.9.5 is now available at the usual place, http://www.ccil.org/~cowan/XML/tagsoup . This is a bug-fix release, but the bug goes right back to the...
John Cowan
johnwcowan
Online Now Send Email
Aug 1, 2004
2:32 am
94
I am using TagSoup with org.apache.xalan.xsltc.trax.SAX2DOM to read Web Pages and create DOM Documents in order to parse data. I have come across web pages on...
WoofGrrrr@...
woofgrrrr
Offline Send Email
Aug 8, 2004
8:45 pm
95
... This is a known problem which will be fixed in the next release, which I expect to have out shortly. Attribute names beginning with digits will be changed...
John Cowan
johnwcowan
Online Now Send Email
Aug 9, 2004
3:00 am
96
TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish,...
John Cowan
johnwcowan
Online Now Send Email
Aug 10, 2004
7:23 pm
97
This release fixes a paper-bag bug in 0.9.6 that went undiscovered; all newlines in character content were being changed to spaces. See ...
John Cowan
johnwcowan
Online Now Send Email
Aug 13, 2004
9:42 pm
98
Thank you giving me something amusing to google. I've had a few of those in my day. :-) Howard...
Howard Katz
howardk@...
Send Email
Aug 13, 2004
10:05 pm
99
Help! I'm trying to compile the example shown on Hackdiary, http://www.hackdiary.com/archives/000041.html I loaded perl. I loaded Xalan.jar in \lib under my...
tombee641
Offline Send Email
Aug 15, 2004
4:01 pm
100
... Discard all the classes in the tagsoup-0.9.7/src/java/.../test directory; they were released by accident. I have now yanked them from both the source and...
John Cowan
johnwcowan
Online Now Send Email
Aug 15, 2004
9:37 pm
101
Hi, XML header tag is added a second time when it allready exists. Run tagsoup on the attached testfile to see... Thanks for your help, Sytse Hengeveld ... ...
sytse@...
Send Email
Aug 16, 2004
12:46 pm
102
... Thanks. This is one of the last remaining well-formedness bugs, and it'll be fixed in the next release (I would have fixed it in this one, except for some...
John Cowan
johnwcowan
Online Now Send Email
Aug 16, 2004
12:52 pm
103
I think I see what's happening. According to the HTML DTD NOSCRIPT is not allowed in the HEAD. Therefore, tagSoup closes the HEAD as soon as it sees...
Elliotte Rusty Harold
elharo@...
Send Email
Aug 17, 2004
9:00 pm
Messages 74 - 103 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help