I'm a bit new to Java and TagSoup. I think I've got things set up to Tag-Soup-ize some files, but I'm not sure how to feed them through. Any pointers? ...
Hello, just wondering the following entity " " is transform to which JAVA caracter? I'm doing HTML to text conversion using TagSoup and I need to handle ...
Here is the bug fix for some of the missing setOutputProperties. It adds OMIT-XML-DECLARATION and METHOD=html.
This fix deprecates setHTMLMode in favor of...
The attached bug fix removes (most) dependencies on JDK 1.2 collections
and adds --help to the command line parser.
This patch is independent from the...
I have an issue using Tagsoup. Nevermind the content, focus on the tags and entities, With this input: <p><b>Monica :</b> <laughs> Oh yeah. </p> ...
Internet Explorer has a feature known as conditional comments, which have embedded markup that is parsed by IE but treated as comment by other browsers. In...
Hey all, First off all, let me say that I ran across TagSoup earlier this week and TSaxon today, and they rock! I'm trying to put together a simple scraper to...
... The trouble is that "&part" is a legitimate HTML entity reference to the Unicode character U+2202, PARTIAL DIFFERENTIAL. Since TagSoup does not know that...
... I'm going to reluctantly reject this, for three reasons: 1) I don't want to write a full conditional-comments interpreter; 2) The documentation shows that...
Hi again, I apologize for this idiot question. After checking, the delay was introduced in _my code_ by a DNS lookup delay. For information, this DNS lookup...
... Speed comparisons aren't meaningful on different machines. Try downloading the page first using wget or curl, and then run TagSoup against it locally. ...
Interesting usage. I developed a set of XSS filters as SAX2 filters on top of TagSoup. Do you think that the Poesia project would be interested in XSS filters...
... Hi Leigh, XSS filtering is a good idea, but the main purpose of Poesia is porn filtering. As we are in alpha development, security filtering is not our ...
For the last couple of days I have tried to access the Tagsoup website at the following addresses without any luck. http://mercury.ccil.org/~cowan/XML/tagsoup/...
Hi all, I'm trying to use TagSoup to process a template HTML file into valid XML so that I can output an XSLT file; because XSLT will be used to manipulate the...
... [snip] ... You don't make clear whether you are using TagSoup from the command line or as a SAX parser library. If from the command line, use the --any ...
Hello Everybody, I am just Java greenhorn ;-) I am not able to build tagsoup-1.0rc3 source from the build.xml. A get exactly the same problem as described in...
Hello John, Thanks for a clue, but I still do not know what to do. Do I have to install another version of xalan? Where will you start if you are on my place? ...
... that's me again ;-) I have solved the problem. It really seems to be a bug in Java 1.5. I have installed 1.4 and it worked perfectly. Many regards Jan ... ...
... Indeed. I should probably put this on the Web page, now that people will be using 5.0 out of the box a lot more. AFAIK, TagSoup works fine under 5.0 once...
... That is exactly what I have done. It works pretty fine! ... That is a very good idea. It is the very first impression which one is confrontated with. If it...
Hello all, I'm just getting started using the tagsoup library to parse some content. What I'm discovering is that for many HTML documents I get a parsing error...
... That can't be a TagSoup error, as TagSoup never reports any errors (except low-level IOExceptions when a file cannot be read or something of the sort)....