My html is as following: <pre>@misc{ granville-positive,author = "Andrew Granville", title = "On Positive Integers <=x With Prime Factors <=t log x", url =...
hi i am using tadsoup for converting html to xhtml. for this i got jar file of tagsoup. and usind command: java -jar tagsoup-1.0rc3.jar --files foo.html but it...
hi i am using tadsoup for converting html to xhtml. for this i got jar file of tagsoup. and usind command: java -jar tagsoup-1.0rc3.jar --files foo.html but it...
... I can only think that this has something to do with the JAVM you are using or the version of Java. Can you send back the results of "java -version"? -- ...
Dear tagsoup friends, I am a contributer in the Poesia project (www.poesia-filter.org), which is an Internet content filter for kids. I am using tagsoup for...
somewhere sufficently wild. this is relevant to the bugs you're trying to fix with the latest update ("This release cleans up long-standing problems with...
Garry Hill
garry@...
Jul 21, 2005 1:22 am
319
Hi all, I found this in a html page from the wild: <A HREF="http://i2as.idregie.com/c.php? s=396&w=468&h=60"> Ok, that's quite brutish, but tagsoup fixes this...
... TagSoup is not aware of which attributes are supposed to contain URIs, so it just does minimal SGML/XML fixup, namely converting line-ends to spaces. -- ...
... Well, no one product can do everything. Jericho (thanks for the reference) is about examining and perhaps modifying the HTML at the lowest level. TagSoup...
hi johan my problem is been resolved by jericho html parcer actually i was using tag-soup for it. but jericho is well documented and it do parcing at basic...
... TagSoup conforms to the behavior of SAX parsers, and requires no programmer-level documentation of its own except in the properties and features that can...
I'm a bit new to Java and TagSoup. I think I've got things set up to Tag-Soup-ize some files, but I'm not sure how to feed them through. Any pointers? ...
Hello, just wondering the following entity " " is transform to which JAVA caracter? I'm doing HTML to text conversion using TagSoup and I need to handle ...
Here is the bug fix for some of the missing setOutputProperties. It adds OMIT-XML-DECLARATION and METHOD=html.
This fix deprecates setHTMLMode in favor of...
The attached bug fix removes (most) dependencies on JDK 1.2 collections
and adds --help to the command line parser.
This patch is independent from the...
I have an issue using Tagsoup. Nevermind the content, focus on the tags and entities, With this input: <p><b>Monica :</b> <laughs> Oh yeah. </p> ...
Internet Explorer has a feature known as conditional comments, which have embedded markup that is parsed by IE but treated as comment by other browsers. In...
Hey all, First off all, let me say that I ran across TagSoup earlier this week and TSaxon today, and they rock! I'm trying to put together a simple scraper to...
... The trouble is that "&part" is a legitimate HTML entity reference to the Unicode character U+2202, PARTIAL DIFFERENTIAL. Since TagSoup does not know that...
... I'm going to reluctantly reject this, for three reasons: 1) I don't want to write a full conditional-comments interpreter; 2) The documentation shows that...
Hi again, I apologize for this idiot question. After checking, the delay was introduced in _my code_ by a DNS lookup delay. For information, this DNS lookup...
... Speed comparisons aren't meaningful on different machines. Try downloading the page first using wget or curl, and then run TagSoup against it locally. ...
Interesting usage. I developed a set of XSS filters as SAX2 filters on top of TagSoup. Do you think that the Poesia project would be interested in XSS filters...
... Hi Leigh, XSS filtering is a good idea, but the main purpose of Poesia is porn filtering. As we are in alpha development, security filtering is not our ...
For the last couple of days I have tried to access the Tagsoup website at the following addresses without any luck. http://mercury.ccil.org/~cowan/XML/tagsoup/...