Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 935 - 980 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
935
Hi. I am having a problem with conversion of HTML entites. The specific entity that is causing me problems at the moment is the entity &#56256;. When I try to...
Jaran Nilsen
jaran.nilsen@...
Send Email
Sep 3, 2007
9:16 am
936
... Just set the output encoding to something other than UTF-8. It has to be something your Java VM understands; US-ASCII will always work. -- John Cowan...
John Cowan
johnwcowan
Online Now Send Email
Sep 3, 2007
7:22 pm
937
My input documents are russian, chinese and whatnot, so I fear US-ASCII will not do me much good? Or am I wrong? First thing I do when I download the documents...
Jaran Nilsen
jaran.nilsen@...
Send Email
Sep 3, 2007
7:27 pm
938
... No, it's the *output* encoding that controls whether character references are generated. TagSoup doesn't know which encodings can support which characters...
John Cowan
johnwcowan
Online Now Send Email
Sep 4, 2007
2:26 am
939
Ok, I will see if I can solve it somehow. Thanks a lot for your input :) Jaran...
Jaran Nilsen
jaran.nilsen@...
Send Email
Sep 4, 2007
6:16 am
940
... This bug breaks tagsoup for my use. I am willing to help fix it. Is the bug in definitions/html.stml? Or I could fall back to version 1.0. Where is there a...
Tom Van Vleck
thvv
Offline Send Email
Sep 14, 2007
9:39 pm
941
... Very probably. I'll try to look for it this weekend, if I can. ... http://www.ccil.org/~cowan/XML/tagsoup/tagsoup-1.0.jar . But I make no guarantees that...
John Cowan
johnwcowan
Online Now Send Email
Sep 14, 2007
9:48 pm
942
Hi, I have an application which receives html which comes out of a Mozilla application, subsequently the structure of the html is valid (in that all tags have...
richardwilko
Offline Send Email
Sep 19, 2007
10:21 am
943
... I assume you mean that it has unmarked empty-tags, unquoted attribute values, short-form attributes like "checked" for "checked='checked'" and the like....
John Cowan
johnwcowan
Online Now Send Email
Sep 19, 2007
12:15 pm
944
Hi, ... Yes all them sort of things, our application is a crawler so it gets exposed to all sorts of pages. For example the first error i get in my current...
richardwilko
Offline Send Email
Sep 19, 2007
2:14 pm
948
Hi there. There were signals on Nutch mailing list that TagSoup forces entity substitution in URIs. This indeed seems to be the case -- not good for the ...
Dawid Weiss
dawid_weiss
Offline Send Email
Oct 17, 2007
12:16 pm
953
When using TagSoup from the command line, one can use the --lexical option to have it report comments. How does one do this programmatically? I tried just...
Elliotte Harold - jav...
elharo@...
Send Email
Nov 2, 2007
8:16 pm
954
Disregard previous message. The comments are going through. The bug (as yet unidentified) is not where we thought it is, but does not seem to be in TagSoup. --...
Elliotte Harold - jav...
elharo@...
Send Email
Nov 2, 2007
8:32 pm
964
Hello, I've downloaded Tagsoup version 1.1.3 from the Tagsoup homepage and am able to use it with Java 6. Now I've tried to use it with Java 5 (1.5.0_12),...
Ole Laurisch
ole.laurisch@...
Send Email
Dec 12, 2007
4:29 pm
965
Hm... works for me on Java 1.5.0_14. Just to get the ball rolling, I'll ask the usual starter question (with no insult meant): are you sure you have...
dpq1pt2bc
Offline Send Email
Dec 12, 2007
9:41 pm
966
For interest's sake, in our app we tell TagSoup to retain comments from the HTML input like this: XMLWriter xmlWriter; try { xmlWriter = new XMLWriter(new ...
dpq1pt2bc
Offline Send Email
Dec 12, 2007
9:47 pm
967
For completeness: The 'UTF8' variable (class member, actually) is defined as "utf-8" in the code I just posted. Mark ... finds...
dpq1pt2bc
Offline Send Email
Dec 12, 2007
9:49 pm
968
Hi Mark, at first I wanted to answer "Hey, c'mon! Sure I have the tagsoup jar in my classpath", but then I double checked it and found out the following. All...
Ole Laurisch
ole.laurisch@...
Send Email
Dec 13, 2007
7:45 am
969
I have found a web site [http://canada.com/] which uses '<?xml:namespace prefix = cwi />' in many of its pages, including its main page. The pages start with...
dpq1pt2bc
Offline Send Email
Dec 14, 2007
5:16 pm
970
... Just a minor nitpick: ... It is actually not even well-formed xml: processing instructions can not have target that starts with 'xml' (case insensitive),...
Tatu Saloranta
cowtowncoder
Offline Send Email
Dec 14, 2007
5:43 pm
971
... Not minor at all. :) However, I do not believe you are correct. From the same XML 1.0 recommendation <http://www.w3.org/TR/REC-xml/#sec-pi> you...
Mark Fitzgerald
dpq1pt2bc
Offline Send Email
Dec 14, 2007
6:15 pm
972
... Right. Considered without regard to case: <?xml ...> is not well formed as a PI (the XML declaration is not a PI); <?xml:foo ...?> is XML well-formed, but...
John Cowan
johnwcowan
Online Now Send Email
Dec 14, 2007
9:14 pm
973
... You are absolutely correct. :-) My mistake -- I did mix up rules for PI names and restrictions on reserved namespace prefixes (where anything starting with...
Tatu Saloranta
cowtowncoder
Offline Send Email
Dec 14, 2007
10:52 pm
974
... Thanks, John. Best of luck finding some free time; I'll keep my eyes peeled for your update. :) Mark...
Mark Fitzgerald
dpq1pt2bc
Offline Send Email
Dec 17, 2007
5:13 pm
975
... Thanks for confirming - I wasn't 100% sure. :) ... Interesting tidbit! What section is that in? ... Mark...
Mark Fitzgerald
dpq1pt2bc
Offline Send Email
Dec 17, 2007
5:14 pm
976
As a New Year's present to the TagSoup community (and to fulfill a pre-New-Year resolution of mine), I've completed development work on TagSoup 1.2. This is...
John Cowan
johnwcowan
Online Now Send Email
Jan 1, 2008
10:19 pm
977
There are a great many changes, most of them fixes for long-standing bugs, in this release. Only the most important are listed here; for the rest, see the...
John Cowan
johnwcowan
Online Now Send Email
Jan 5, 2008
5:15 pm
978
... bugs, ... instance of ... start-tags. ... I've created the upload request, so it should be available in the Maven repositories in a few days: ...
Stephen Duncan Jr
scdjr42
Offline Send Email
Jan 6, 2008
2:28 pm
979
... Thanks. -- John Cowan cowan@... http://ccil.org/~cowan Female celebrity stalker, on a hot morning in Cairo: "Imagine, Colonel Lawrence, ninety-two...
John Cowan
johnwcowan
Online Now Send Email
Jan 6, 2008
2:33 pm
980
Hello, My program take a string from database who contain something like : "<p>l&rsquo;eau est froide</p>". the &rsquo; entity is '. Into my SAX parser, I...
mickael_ourtaau
Offline Send Email
Jan 7, 2008
5:51 pm
Messages 935 - 980 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help