Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 1240 - 1269 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
1240
Here's a simplified example of the HTML I'm trying to parse: <p> <span id="data"> <p>important information</p> </span> </p> And here's what I get out of...
mark_renouf
Offline Send Email
Jan 8, 2009
3:54 am
1241
Tagsoup is right, you're wrong. You can't have block elements such as <p> inside inline elements such as <span>; tagsoup fixes this problem for you. - Godmar...
Godmar Back
godmar@...
Send Email
Jan 8, 2009
3:57 am
1242
I understand, and that's what I suspected. In this case I'm not interested in correcting the HTML, I simply want to access the contents of the SPAN with id of...
mark_renouf
Offline Send Email
Jan 8, 2009
4:07 am
1243
... In order to do that, you have to change the HTML grammar in src/definitions/html.tssl to specify a different language. The simplest way to do that is...
John Cowan
johnwcowan
Online Now Send Email
Jan 8, 2009
6:05 am
1244
Hey, Thanks for some great software! I'm having some trouble with manipulating HTML by parsing it with tagsoup into a DOM and then writing it again. The main...
dennis.thrysoe
Offline Send Email
Jan 15, 2009
2:27 pm
1245
... I may at some future date give the table and form elements a content model of M_ANY, since people are quite good about providing the end tags for them. ......
John Cowan
johnwcowan
Online Now Send Email
Jan 15, 2009
10:03 pm
1246
Hi, I did just that - allowed M_ANY within table and tr, and that fixed my problem. Maybe tagsoup should be distributed with such a "relaxed" schema in the...
dennis.thrysoe
Offline Send Email
Jan 18, 2009
8:42 am
1247
Hi, I am having an issue. Tag soup seems to convert "&sigma;" to "?" Can i somehow code to leave it as is? thanks Samir...
samirss
Offline Send Email
Feb 9, 2009
5:49 pm
1248
... No. TagSoup interprets entity references on iput, but does not regenerate them on output. But if you set the output encoding to something other than...
John Cowan
johnwcowan
Online Now Send Email
Feb 9, 2009
6:52 pm
1249
I feel like I've seen this discussed at some point in the past 5 years, but I can't remember or find the answer. If an HTML page has an ampersand in the text,...
Michael Giles
michael_a_giles
Online Now Send Email
Feb 13, 2009
6:45 pm
1250
... Yes, it should be handled (and returned as a raw &, to be escaped on output as &amp;). ... @#$*, I thought I got rid of that class of bug. Apparently the...
John Cowan
johnwcowan
Online Now Send Email
Feb 13, 2009
7:21 pm
1251
Hi all, What is the best way to unit test the parser methods like startElement(), endElement(), ... one at a time, and by starting from reading an XML file...
ciel_et_espace
Offline Send Email
Feb 16, 2009
8:09 am
1252
... You got me there. Parsing is inherently a tightly coupled group of behaviors, since everything depends on building up a rather complex and varying state. ...
John Cowan
johnwcowan
Online Now Send Email
Feb 16, 2009
8:26 am
1253
... Almost by definition unit testing doesn't read files. Passing your own arguments is the right way to *unit* test. That said, it is important to test with...
Elliotte Harold
elharo@...
Send Email
Feb 16, 2009
2:42 pm
1254
Thank you for your answer. Your proposal tends to indicate that we need to go for an intrusive solution in which we modify the real code to throw exceptions...
ciel_et_espace
Offline Send Email
Feb 16, 2009
7:38 pm
1255
Yes I agree and that is what I am doing for the time being. I don't read files but I get my test input from unit test strings. BR, CP. ... from ... own ... ...
ciel_et_espace
Offline Send Email
Feb 16, 2009
7:40 pm
1256
... Without more context, I simply can't say. -- John Cowan cowan@... http://ccil.org/~cowan The penguin geeks is happy / As under the waves they lark ...
John Cowan
johnwcowan
Online Now Send Email
Feb 16, 2009
8:04 pm
1257
With TSaxon the -H switch allows one to process (ill formed) HTML files when they are the source. What about when the source file is XML and you're trying to...
neville88
Offline Send Email
Feb 24, 2009
8:58 pm
1258
... I don't know any way to do that. The -H switch is just shorthand for the Saxon switch '-x org.ccil.cowan.tagsoup.Parser', and that affects both the main...
John Cowan
johnwcowan
Online Now Send Email
Feb 24, 2009
10:11 pm
1259
I want to use Tagsoup to process a html page (a malformed one) and i got it to work using the comand line -H flag. However when i tried it in code, following...
magmaruless
Offline Send Email
Mar 3, 2009
12:11 am
1260
Ok i worked around it =). I went to the page: http://home.ccil.org/~cowan/XML/tagsoup/tsaxon/StyleSheet.java And used the same method: ...
magmaruless
Offline Send Email
Mar 3, 2009
12:32 am
1261
As a followup: I ended up having to pass the output from tagSoup v1.2 into a build of htmlTidy in order to get it to parse in TinyXML for certain html samples...
kiru42
Offline Send Email
Mar 3, 2009
3:38 am
1262
... Looks like TinyXML is not a conforming XML parser, if it doesn't understand character references. To get UTF-8 output without entities, though, just...
John Cowan
johnwcowan
Online Now Send Email
Mar 4, 2009
7:19 pm
1263
... Erm, I hate to be slightly rude, but haven't we had the conversation about the command line problems re: output encodings and win32? I started this whole...
kiru42
Offline Send Email
Mar 4, 2009
8:33 pm
1264
Hello there, I'm getting a java.lang.OutOfMemoryError after 400 xslt transformations with Saxon, using tagsoup as the parser. Detailed exception:...
magmaruless
Offline Send Email
Mar 7, 2009
12:34 pm
1265
The documentation for XMLWriter says * <p>According to the XML Recommendation, <em>all</em> whitespace * in an XML document is potentially significant to an...
Klotz, Leigh
leighklotz
Offline Send Email
Mar 11, 2009
6:24 pm
1266
... If you look at the Infoset, you'll see that whitespace outside the root element is generally considered nonsignificant, despite the letter of the XML Rec....
John Cowan
johnwcowan
Online Now Send Email
Mar 11, 2009
7:23 pm
1267
... [mailto:tagsoup-friends@yahoogroups.com] On Behalf Of John Cowan ... TagSoup 1.2 ... whitespace ... root ... question. John, Thank you for your quick...
Klotz, Leigh
leighklotz
Offline Send Email
Mar 11, 2009
8:41 pm
1268
... Sorry, quite right. Since I don't use Windows, I have no idea why the output encoding is broken (if that's really what's happening). Can someone using...
John Cowan
johnwcowan
Online Now Send Email
Mar 11, 2009
9:25 pm
1269
... I can't replicate this: What I get is simply: <?xml version="1.0" standalone="yes"?> <html...
John Cowan
johnwcowan
Online Now Send Email
Mar 11, 2009
9:32 pm
Messages 1240 - 1269 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help