Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 998 - 1029 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
998
Hi John, everyone, Version 1.2 of TagSoup occasionally throws an exception when trying to push back data to the internal PushbackReader. Examples of failing...
Dawid Weiss
dawid_weiss
Offline Send Email
Feb 5, 2008
8:19 am
999
... Thank you very much, especially for the failing input. There was an earlier bug report to this effect, but no examples were forthcoming. ... It should...
John Cowan
johnwcowan
Online Now Send Email
Feb 5, 2008
1:56 pm
1000
... My pleasure. This issue can be avoided by passing custom PushbackReader on the input. See my other e-mail about nested tags, I think that one can be...
Dawid Weiss
dawid_weiss
Offline Send Email
Feb 5, 2008
2:32 pm
1001
... Problem solved! The issue arises when an & appears at the end of a line, and the line terminator is either-LF (Windows) or CR alone (Mac Classic), as in...
John Cowan
johnwcowan
Online Now Send Email
Feb 6, 2008
1:30 am
1002
Hi John and tagsoup-friends, would it be possible to briefly describe (or provide reliable pointers to) a way to create an instance of...
Godmar Back
godmar@...
Send Email
Feb 6, 2008
4:02 am
1003
... I don't know of any HTML DOMs that have pluggable parsers, since there is no standard interface for streaming HTML parsers. Most people use XML DOMs or...
John Cowan
johnwcowan
Online Now Send Email
Feb 6, 2008
5:14 am
1004
... I'll investigate that. I was intrigued by your suggestion in your 2002 talk that "SAX-to-DOM converters" were abundant; apparently, this doesn't include...
Godmar Back
godmar@...
Send Email
Feb 6, 2008
5:53 am
1005
... SAX is purely an XML standard, unless you are using an HTML-to-SAX parser like Cyberneko, TagSoup, or JTidy. ... The HTML DOM doesn't really buy you much...
John Cowan
johnwcowan
Online Now Send Email
Feb 6, 2008
6:38 am
1006
Confirmed, works as advertised -- thanks John. For some reason my other bug report didn't get through to the list. I will try to re-send it again to start ...
Dawid Weiss
dawid_weiss
Offline Send Email
Feb 6, 2008
9:19 am
1007
Another bug, this time more serious and with no apparent workaround (sorry, John). Try to run: java -jar tagsoup-1.2.jar error-67.txt > out on the ZIPped HTML...
Dawid Weiss
dawid_weiss
Offline Send Email
Feb 6, 2008
9:22 am
1008
... I've been using Castor quite a bit for processing structured XML and in my mind had hoped that HTML DOM would provide a binding that would be similarly...
Godmar Back
godmar@...
Send Email
Feb 6, 2008
1:03 pm
1009
... I admit that this case is extreme (the 375K input balloons to a 15M output file), but not actually erroneous. There are 485 <small> tags in the input and...
John Cowan
johnwcowan
Online Now Send Email
Feb 6, 2008
3:19 pm
1010
... It is my understanding that there's also NUX (http://dsd.lbl.gov/nux/index.html ), which embeds XOM. Do you recommend using the NUX wrappers/packaging or...
Godmar Back
godmar@...
Send Email
Feb 6, 2008
3:26 pm
1011
... Right... I have more of these -- when you make real crawls, you pull some real #*&^ out of the Web. ... Ok, I admit I never thought of the semantics of...
Dawid Weiss
dawid_weiss
Offline Send Email
Feb 6, 2008
3:31 pm
1012
I've implemented three approaches to this. 1) Tagsoup parses, XOM represents XHTML in XML, Output via TagSoup serializer is XHTML 1.0. I had to add a number of...
benson_margulies
benson_margu...
Offline Send Email
Feb 6, 2008
4:01 pm
1013
... Thanks for these answers! I actually don't need to output the tree, I'm just interested in analyzing it conveniently - say feed it to an expert system such...
Godmar Back
godmar@...
Send Email
Feb 6, 2008
4:07 pm
1014
... I have done both with good success, and I think it comes down to whether you need just basic tree-access with xpath (which XOM can do well), or more...
Tatu Saloranta
cowtowncoder
Offline Send Email
Feb 6, 2008
6:12 pm
1015
... Fixing the lower-case doctype bug turns out to be trivial: change the "equals" to "equalsIgnoreCase" in line 837 of Parser.java. ... Still working on this...
John Cowan
johnwcowan
Online Now Send Email
Feb 7, 2008
9:39 pm
1016
... No, not a problem: I could introduce a new element property in TSSL which says "terminate all restartable elements". The question is whether this produces...
John Cowan
johnwcowan
Online Now Send Email
Feb 7, 2008
9:51 pm
1017
... "scripsit"? What language is this :) ... I am not really an expert in the HTML spec (don't even know if this is anywhere in the spec), but intuitively an...
Dawid Weiss
dawid_weiss
Offline Send Email
Feb 8, 2008
7:52 am
1018
... Latin: "has written", or more accurately "has completed writing". ... Correct, I think. ... Very painful, which is why I've avoided it. -- A: "Spiro...
John Cowan
johnwcowan
Online Now Send Email
Feb 8, 2008
2:47 pm
1019
Hi, I want to parse HTML files and make customizable XML files corresponding to those HTML files. How can this be done? Any suggestions would be of great help....
akhil192502
Offline Send Email
Feb 13, 2008
5:42 am
1020
Hi, My HTML files have data in Japanese. How to parse them using Tagsoup? Regards, Akhilesh Aggarwal...
akhil192502
Offline Send Email
Feb 13, 2008
5:45 am
1021
... You will have to specify the correct encoding, such as Shift-JIS or ISO-2022-JP, and it needs to be one that your Java VM understands. -- "Well, I'm back."...
John Cowan
johnwcowan
Online Now Send Email
Feb 13, 2008
5:56 am
1022
... I recommend that you use TSaxon or Saxon-B or any other XSLT processor that can use TagSoup to parse its input. -- John Cowan cowan@......
John Cowan
johnwcowan
Online Now Send Email
Feb 13, 2008
5:58 am
1023
... Thanks John. Can you help me with some sample code? I have not used TagSoup or Saxon before or if you can direct me to some documentation on the same. ...
akhil192502
Offline Send Email
Feb 13, 2008
11:35 am
1024
... Saxon is well-documented at http://saxon.sourceforge.net. You'll need to know XSLT, though. -- Mark Twain on Cecil Rhodes: John Cowan I...
John Cowan
johnwcowan
Online Now Send Email
Feb 13, 2008
4:58 pm
1027
Hi, I've downloaded 1.2 source code and have seen some folders tssl, stml. What are these for? I haven't found anything regarding this on the documentation. ...
Diego Campo
diego.campo@...
Send Email
Mar 5, 2008
11:36 am
1028
... They are required when building TagSoup from source. You cannot just compile the provided source code yourself -- you must use Ant. -- John Cowan...
John Cowan
johnwcowan
Online Now Send Email
Mar 5, 2008
1:43 pm
1029
Is this to produce the jar? I'd like to integrate the code so I can make my own changes if necessary, with no jar creation. Should I then integrate the tagsoup...
Diego Campo
diego.campo@...
Send Email
Mar 5, 2008
3:48 pm
Messages 998 - 1029 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help