Hi, I am a new user of TagSoup. Here is the problem to which i am targetting: 1) I need to intercept the response from the Server. 2) Modify the response as...
Hi, One thing which i noticed is the problem is not because its wrapped around comments its coverting the < to < inside the script tag. Now the problem is...
... TagSoup has to do that, precisely because the so-called "comment" must be treated by XSLT as text. If TagSoup passed the sequences as <!-- and -->, then ...
Thanks John!!! I also thought the same and it worked. Hitesh ... __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail...
Is there a simple way of getting TagSoup to treat a legal html element as a bogon and eliminate it from the output stream? I'm working with html that uses <dd>...
... Well, it's easy to do that in XSLT, but if you don't have an XSLT step in your pipeline already, I understand. ... There is no way currently to remove an...
... Yes, there is. The simplest approach is to implement a trivial JapaneseAutoDetector class, something like this: class JapaneseAutoDetector implements...
I know that an eventual goal of tagsoup is to be configurable to clean up other input besides HTML. Is anyone using it for that now, and how much configuration...
I had this problem too. It turned out to be related to Java 1.5. You should have no problems compiling it with a Java 1.4 compiler (I didn't). I think it's...
Hello, I was just getting started using TagSoup and tried wrapping it with JDOM 1.0. After using TagSoup to pars the HTML of cnn.com, upon passing it to JDOM,...
Hi nezda, I solved this problem subclassing Parser, and getting rid of the comment. This removes all comments, but this doeasn't matter in my case. If you try...
... I also had a problem with comments in some HTML that I'm trying to clean up. I'm using TagSoup to suck in and clean up HTML which is fed into a portlet;...
Group, I apologize if I'm asking something that has already been answered. I read through all the post, but I'm still not finding a solution to what I'm trying...
... If builder is an instance of org.ccil.cowan.tagsoup.Parser: // turn off all namespaces builder.setFeature(org.ccil.cowan.tagsoup.Parser.namespacesFeature, ...
Brian/Group, I still need help and advice. I added a couple hacks to the newest version of tagsoup. Basically, I made it so that it would not setURI for...
Brian/Group: I think I have a lot to learn ;) I think part of the problem is that I'm not using the XMLWriter. And the Parser class punts to the scanner which...
I'm having a problem with TagSoup, both 0.10.2 and 1.0rc2: <br><!-- finish --> is getting stuck into the DOM like <br clear="none"> <!-- finish --> </br> I'm...
Some more things I tried: 1. Used XMLWriter as content handler inside of parser instead of the Parser itself. 2. Tweaked XMLWriter always have uri and qName...
Gernot, I modified your example to use jdom instead of dom4j. It does remove the namespaces just fine. That is cool! I'm still having the problem with the...
Thank you for your timely, informative response. I apologize for my delayed response. I think _your_ problem may be addressed more directly by tagsoup...
... Unfortunately, there are several different uses of the term "CDATA" in SGML. The above logic prevents "script" and "style" from being recognized as CDATA...
... Absolutely correct: this is a bug in TagSoup. I will issue a new version when I'm able to (at the moment I'm having to deal with an illness in the family...