... I don't know this light_html2xml, so I can't comment on what problems there might be in it. ... That's not what's happening. TagSoup reads the HTML file...
Hi John, Â I am in the process of implementing your suggestion but needed a bit more guidances with the following questions: Â ( i ) Downloaded both...
Hi All, Â I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows: Â ...
Hi All, I can confirm that the XPath using Saxon parser ("org.ccil.cowan.tagsoup.Parser") is working with default namespace. I made the mistake of assuming...
Hello, I'm using Flying Saucer (https://xhtmlrenderer.dev.java.net/) in my application together with Tagsoup. Someone reported the following error and I'm not...
... It's very unusual: it means that someone is trying to invoke an abstract method, which normally is caught at compile time. A method was changed from...
Thanks for the quick answer John. I'll recompile it and see if it still throws the exception. Nicu ... compiled. ... html.stml. ... http://www.ccil.org/~cowan...
... I don't know. I built it as a library, and originally added the stand-alone application support for my own testing purposes, but I suspect that many...
I use tagsoup as one step in DeXSS. http://freshmeat.net/projects/dexss/ ... From: tagsoup-friends@yahoogroups.com [mailto:tagsoup-friends@yahoogroups.com] On...
I'm a library user. I used the standalone app for testing, but the projects where I've integrated TagSoup have always been as a library. Is there discussion...
I use TagSoup as a library. I use it to transform lists on the web into XML so it can be loaded into a database. I also use it in my main application to...
... Various "screen-scraping" jobs. E.g., this one which is just for fun: http://www.edavies.nildram.co.uk/#bumps More details at the bottom of this page: ...
Greeting. So, I'm using the tagSoup-1.2.jar file as a stand alone program which I shell out to. What I'm trying for here, is to convert in the wild html into...
... These are symptoms of specifying the wrong input encoding. You can't specify the input as UTF-8 unless the .html file *really is* encoded in UTF-8, or you...
... Recommendation. ... So, I've tried a variety of combinations of --encoding and --output-encoding parameters. The input html does indeed seem to be utf8...
... So it is. ... $ tagsoup --encoding=utf-8 --output-encoding=utf-8 <index.html >index.xhtml ... TagSoup can't provide that. It interprets all entity and...
... http://www.ccil.org/~cowan ... Okay, I'll have to accept that as the tagSoup behavoir. However, small update. On linux, your command line example works...
... Just to make sure: did you verify actual output file contents (and similarly for input), or view using an app? I ask this because the most common problem...
... two systems? On the windows machine, it's Java(TM) SE Runtime Environment (build 1.6.0_07-b06).(Official JRE from Sun) On Ubuntu, it's OpenJDK Runtime...
... this platform? ... similarly for input), or view using an app? I ask this because the most common problem reported is usually caused by a viewing app ...
... just ... with ... Let's try to tackle this from a slightly different angle here. For a moment, let's pretend that I'm a random user who has just discovered...
... About the only thing I can think of, as a difference, is that the platform-specific default encoding may well differ between stock windows system vs....