Hello John, Thanks for a clue, but I still do not know what to do. Do I have to install another version of xalan? Where will you start if you are on my place? ...
... that's me again ;-) I have solved the problem. It really seems to be a bug in Java 1.5. I have installed 1.4 and it worked perfectly. Many regards Jan ... ...
... Indeed. I should probably put this on the Web page, now that people will be using 5.0 out of the box a lot more. AFAIK, TagSoup works fine under 5.0 once...
... That is exactly what I have done. It works pretty fine! ... That is a very good idea. It is the very first impression which one is confrontated with. If it...
Hello all, I'm just getting started using the tagsoup library to parse some content. What I'm discovering is that for many HTML documents I get a parsing error...
... That can't be a TagSoup error, as TagSoup never reports any errors (except low-level IOExceptions when a file cannot be read or something of the sort)....
Last night in Albany at the CDJDN meeting someone complained about JavaScript errors in my XOM pages. That surprised me because I didn't think I had any...
Elliotte Harold
elharo@...
Feb 10, 2006 2:34 am
363
... Because HTML is an SGML application, not an XML one. The script and style elements are of type CDATA, which means that after the start-tag, the only...
... The problem is I do want well-formed XHTML output. i.e. I do want empty elements to be closed (or use an empty-element tag). Is there any way to get both? ...
Elliotte Harold
elharo@...
Feb 10, 2006 9:49 am
365
... That would be self-contradictory. The "comment" is not really an HTML comment, but if I left the < unescaped it would become an XML comment and disappear...
John, Unsurprisingly, you were right. Thanks again for the quick response. Robert ... -- Robert Konigsberg konigsberg@... "Uh Oh. This does not look good...
Hi, I noticed that the tagsoup parser chops any attribute value which is longer then 1980 characters. I think the HTML spec. mentions 1024 characters as the...
I'm new to TagSoup. I really like the command line interface and it's speed / features. I didn't notice much documentation for using the API so I'm hoping you ...
... [snip] ... Currently no, but you can change the size of the array theOutputBuffer in src/templates/org/ccil/cowan/tagsoup/HTMLScanner.java from 2000 to...
... This is the part that TagSoup provides; it is a SAX parser. ... You can use XOM or any other tree model you like; all of them support external SAX parsers....
Ok. So, unless I missed it on the website, I didn't see something that documents using the API to import the file and parse it. Is there documentation, or can...
... If you look at CommandLine.java, you will see how it's done there. -- John Cowan cowan@... www.ccil.org/~cowan www.ap.org If I have seen farther...
Hi, I have just switched from Tidy to Tagsoup after realizing its inherent bugs and heavy weightness. Following is what I need to do: 1) I have dirty text...
... There are two approaches. You can write a program which processes the SAX events directly and retains the ones you want while discarding the ones you...
Can u plz help me that where html entities are trannsformed in the source code. Actually i don't want to tranform xml entities(& , < etc) in the...
... All SAX parsers convert entities to characters when returning values to the caller. It's up to the caller, if it intends to produce output as XML, to...
Thanks John. I m able to do it with XMLWriter or Regex . But when i m removing "<entity name='amp' codepoint='0026'/>" from definitions.html.tssl then it...
... When TagSoup sees an entity that's not in its tables, it returns the entity reference as text (what else is there to do, given the Keep On Truckin'...
Hi, I'm an experienced Java programmer but I've never written a SAX application from scratch and I'm not sure how it should all hang together. I'm trying to...
... Spiders aren't exactly trivial code, but thanks for thinking of TagSoup. You might want to reuse someone else's spider and then use TagSoup to postprocess...
... It's mishandling EOF somewhere. I hope to be able to do some more work on TagSoup soon. I've recently changed jobs, which has been stressful. Let me take...
hi, in order to evaluate tagsoup, I downloaded it and tried it with "java -jar tagsoup-1.0rc3 --files --html ..." on some html pages saved with 'save page...