Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 221 - 254 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
221
Is there a simple way of getting TagSoup to treat a legal html element as a bogon and eliminate it from the output stream? I'm working with html that uses <dd>...
howardckatz
Offline Send Email
Mar 4, 2005
6:29 pm
222
... Well, it's easy to do that in XSLT, but if you don't have an XSLT step in your pipeline already, I understand. ... There is no way currently to remove an...
John Cowan
johnwcowan
Online Now Send Email
Mar 4, 2005
8:16 pm
223
I am trying to use tag soup to convert a page from a Japanese website (http://game.goo.ne.jp/search/all.php?n=%B5%A1%C6% ...
pearandpeanut
Online Now Send Email
Mar 5, 2005
1:34 am
224
... Yes, there is. The simplest approach is to implement a trivial JapaneseAutoDetector class, something like this: class JapaneseAutoDetector implements...
John Cowan
johnwcowan
Online Now Send Email
Mar 5, 2005
3:00 pm
225
[ snip ... ] ... Thanks John. I'll give that a try. Howard...
howardckatz
Offline Send Email
Mar 5, 2005
4:28 pm
226
While I was building source code of tagsoup, i was getting the following error: Buildfile: build.xml init: prepare: [mkdir] Created dir:...
kesong
Offline Send Email
Mar 8, 2005
11:35 pm
230
I know that an eventual goal of tagsoup is to be configurable to clean up other input besides HTML. Is anyone using it for that now, and how much configuration...
DuCharme, Bob (LNG-CHO)
philregion
Offline Send Email
Mar 14, 2005
9:24 pm
231
... No one has told me about it, if so. -- John Cowan jcowan@... http://www.reutershealth.com "Mr. Lane, if you ever wish anything that I can...
John Cowan
johnwcowan
Online Now Send Email
Mar 14, 2005
9:29 pm
232
I had this problem too. It turned out to be related to Java 1.5. You should have no problems compiling it with a Java 1.4 compiler (I didn't). I think it's...
nezda
Offline Send Email
Mar 19, 2005
6:30 pm
233
... Ah. I'm still on Java 1.4, and expect to stay there for a while longer. Thanks for the heads-up. -- John Cowan jcowan@......
John Cowan
johnwcowan
Online Now Send Email
Mar 19, 2005
6:31 pm
234
Hello, I was just getting started using TagSoup and tried wrapping it with JDOM 1.0. After using TagSoup to pars the HTML of cnn.com, upon passing it to JDOM,...
nezda
Offline Send Email
Mar 21, 2005
9:01 pm
235
Hi nezda, I solved this problem subclassing Parser, and getting rid of the comment. This removes all comments, but this doeasn't matter in my case. If you try...
gernot_eger
Offline Send Email
Mar 22, 2005
2:54 pm
236
... I also had a problem with comments in some HTML that I'm trying to clean up. I'm using TagSoup to suck in and clean up HTML which is fed into a portlet;...
Brian Lalor
blalor76
Offline Send Email
Mar 22, 2005
3:13 pm
237
Group, I apologize if I'm asking something that has already been answered. I read through all the post, but I'm still not finding a solution to what I'm trying...
eshipek
Offline Send Email
Mar 23, 2005
12:37 am
238
... If builder is an instance of org.ccil.cowan.tagsoup.Parser: // turn off all namespaces builder.setFeature(org.ccil.cowan.tagsoup.Parser.namespacesFeature, ...
Brian Lalor
blalor76
Offline Send Email
Mar 23, 2005
12:46 am
239
builder.setFeature(org.ccil.cowan.tagsoup.Parser.namespacesFeature, ... builder.setFeature(org.ccil.cowan.tagsoup.Parser.n amespacePrefixesFeatur ... the ... ...
eshipek
Offline Send Email
Mar 23, 2005
4:58 pm
240
Brian/Group, I still need help and advice. I added a couple hacks to the newest version of tagsoup. Basically, I made it so that it would not setURI for...
eshipek
Offline Send Email
Mar 23, 2005
6:36 pm
241
Brian/Group: I think I have a lot to learn ;) I think part of the problem is that I'm not using the XMLWriter. And the Parser class punts to the scanner which...
eshipek
Offline Send Email
Mar 23, 2005
8:48 pm
242
I'm having a problem with TagSoup, both 0.10.2 and 1.0rc2: <br><!-- finish --> is getting stuck into the DOM like <br clear="none"> <!-- finish --> </br> I'm...
Brian Lalor
blalor76
Offline Send Email
Mar 23, 2005
10:37 pm
243
Some more things I tried: 1. Used XMLWriter as content handler inside of parser instead of the Parser itself. 2. Tweaked XMLWriter always have uri and qName...
eshipek
Offline Send Email
Mar 23, 2005
10:46 pm
244
Group: I came across another example so I decided to try it. It is closer to what i'm trying to do. Instead of instantiating the SAXBuilder with ...
eshipek
Offline Send Email
Mar 23, 2005
11:09 pm
245
Hi eric, try another brute-force approach of mine to remove all namespaces: ... DocumentFactory d=reader.getDocumentFactory(); Document document =...
gernot.eger@...
gernot_eger
Offline Send Email
Mar 24, 2005
8:49 am
246
Gernot, I modified your example to use jdom instead of dom4j. It does remove the namespaces just fine. That is cool! I'm still having the problem with the...
eshipek
Offline Send Email
Mar 24, 2005
4:43 pm
248
Thank you for your timely, informative response. I apologize for my delayed response. I think _your_ problem may be addressed more directly by tagsoup...
nezda
Offline Send Email
Mar 26, 2005
4:13 pm
249
... Unfortunately, there are several different uses of the term "CDATA" in SGML. The above logic prevents "script" and "style" from being recognized as CDATA...
John Cowan
johnwcowan
Online Now Send Email
Mar 27, 2005
2:46 am
250
... Absolutely correct: this is a bug in TagSoup. I will issue a new version when I'm able to (at the moment I'm having to deal with an illness in the family...
John Cowan
johnwcowan
Online Now Send Email
Mar 27, 2005
3:25 am
251
... The version of XMLWriter that ships with TagSoup has a proper HTML mode. I don't know if JDOM can cope with a SAX ContentHandler, but if it can, that'll do...
John Cowan
johnwcowan
Online Now Send Email
Mar 27, 2005
4:36 am
252
... That is a fairly permanent restriction, or at least I don't see a simple way around it. The trouble is that comments are outside the normal TagSoup...
John Cowan
johnwcowan
Online Now Send Email
Mar 27, 2005
4:38 am
253
... Ok. As much as I like TagSoup for programmatically working with in-the-wild HTML, I recently discovered that JTidy also provides an internal DOM facility,...
Brian Lalor
blalor76
Offline Send Email
Mar 27, 2005
2:46 pm
254
Awesome tool John! This library fixes 99% of my HTML parsing woo. Was trying to build a page analyzer tool that grabs the visible content of webpage. Then I...
gimmemyhoney
Offline Send Email
Apr 6, 2005
3:40 pm
Messages 221 - 254 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help