Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 847 - 939 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
847
All versions of TagSoup are now, by the wave of my magic wand, licensed under the Apache 2.0 license as well as the Academic Free License 3.0 and the GNU GPL...
John Cowan
johnwcowan
Online Now Send Email
Jun 2, 2007
5:47 am
848
I was using tagsoup to parse some html and kept getting funny errors and I finally realized that it was attempting to parse the tags within comments as well....
briantfan
Offline Send Email
Jun 5, 2007
10:00 pm
849
... Short answer: It's because TagSoup by default transforms the page to XHTML. If you want HTML output, use the --html switch. Long answer: Comments inside...
John Cowan
johnwcowan
Online Now Send Email
Jun 6, 2007
6:16 am
850
Thanks for the quick response. I first tried running with the --html switch, but that yielded this on the test page: <html><head><title>Test...
briantfan
Offline Send Email
Jun 6, 2007
8:53 pm
851
... That is indeed the issue. ... Definitely not going there! ... This might or might not help; I'm not sure. Essentially it would be a change in parsing...
John Cowan
johnwcowan
Online Now Send Email
Jun 6, 2007
9:04 pm
852
Forgot to add: I do see why comments have to get escaped within javascript blocks for xhtml. But it seems like comment nodes don't even make it into the ...
briantfan
Offline Send Email
Jun 6, 2007
9:08 pm
853
... Depending on the DOM implementation (TagSoup doesn't have a DOM implementation itself) you may need to tell it to tell TagSoup to report lexical features,...
John Cowan
johnwcowan
Online Now Send Email
Jun 6, 2007
10:29 pm
854
... Back in the day, the day being 1997 when we had to support Netscape Navigator 3 and Internet Explorer 3, this was a JavaScript FAQ: it was essential to use...
Nick Fitzsimons
nick@...
Send Email
Jun 6, 2007
11:51 pm
855
... Exactly! ... And so is wrapping Javascript in comment delimiters for the sake of unbelievably ancient browsers. -- John Cowan cowan@......
John Cowan
johnwcowan
Online Now Send Email
Jun 7, 2007
12:15 am
866
Hi All, I am new to TagSoup, I need ur help in using TagSoup. Basically i want to know how exactly to use TagSoup, the setup. I have downloaded...
savitha Mariswamy
savitm2003
Offline Send Email
Jun 20, 2007
5:16 pm
867
Note: forwarded message attached. ... Pinpoint customers who are looking for what you sell. Hi All, I am new to TagSoup, I need ur help in using TagSoup....
savitha Mariswamy
savitm2003
Offline Send Email
Jun 20, 2007
5:30 pm
868
... You need to understand how to use SAX parsers in general. Start at sax.sourceforge.net, or google for "SAX tutorial". -- Principles. You can't say A is...
John Cowan
johnwcowan
Online Now Send Email
Jun 20, 2007
8:03 pm
871
Hi All, For conversion from html to XHTML, i am using TagSoup but it doesnot work good with mathml tags. for eg if my tags are likes. <html><mathml><mstyle...
savitha Mariswamy
savitm2003
Offline Send Email
Jun 28, 2007
4:09 pm
872
... Right. TagSoup does not currently handle foreign tagsets very well. Someone could write a MathML schema in TagSoup Schema Language, but it would also be...
John Cowan
johnwcowan
Online Now Send Email
Jun 28, 2007
6:38 pm
873
Thanks John, Are you aware of any parser which supports foreign or mathml tags? Does JTidy support this? --Savitha John Cowan <cowan@...> wrote: ... ...
savitha Mariswamy
savitm2003
Offline Send Email
Jun 28, 2007
7:11 pm
880
... Hash: SHA1 I had formed the impression (not sure from where) that the tagsoup 'rectifier' did no lookahead, but I can't square that with the following...
ht@...
henrysthompson
Offline Send Email
Jul 5, 2007
12:27 pm
881
... Correct. ... The document root start-tag is magic, and corresponding end-tags are ignored. So the first <html> is the document root, the first </html> is...
John Cowan
johnwcowan
Online Now Send Email
Jul 6, 2007
1:08 am
883
Hi John and fellow members, I'm working on a program to analyze web page structural similarity and currently using Tagsoup as the html parser and JDOM to form...
robin_rspvh
Offline Send Email
Jul 6, 2007
7:06 am
885
... That's a JDOM issue; JDOM wants the comments and asks TagSoup for them. -- De plichten van een docent zijn divers, John Cowan die van het gehoor...
John Cowan
johnwcowan
Online Now Send Email
Jul 6, 2007
9:26 pm
893
Hi. Is there any way to tell tagsoup to remove any HTML comments it finds in the input document? I have been unable to find this in the documentation. Thanks...
Jaran Nilsen
jaranmann
Offline Send Email
Jul 11, 2007
8:25 am
894
... If you mean the command-line program, they are removed by default. If you mean the library, it's all about whether you register a LexicalHandler with the...
John Cowan
johnwcowan
Online Now Send Email
Jul 11, 2007
3:18 pm
903
Hi, It's not clear to me from either the license or the source code comments whether tagsoup is copyrighted, and whether it requires attribution if used in a...
m_i_k_e_i_l_e_s
Offline Send Email
Jul 20, 2007
1:53 am
904
... It is copyrighted by me (there are no copyright notices at present, but they are not actually required; I'll be adding new ones in the fairly near future)....
John Cowan
johnwcowan
Online Now Send Email
Jul 20, 2007
2:58 am
915
... Span is for inline content, div for block content, so yes, that happens. If you don't like it, change the schema. -- John Cowan http://ccil.org/~cowan...
John Cowan
johnwcowan
Online Now Send Email
Jul 29, 2007
11:46 pm
934
Hi, I am using TagSoup, Dom4J and Jaxen to parse various web-pages and pull out some key pieces of data. Mostly, and mostly thanks to TagSoup itself of course,...
edsherington
Offline Send Email
Aug 30, 2007
9:43 am
935
Hi. I am having a problem with conversion of HTML entites. The specific entity that is causing me problems at the moment is the entity &#56256;. When I try to...
Jaran Nilsen
jaran.nilsen@...
Send Email
Sep 3, 2007
9:16 am
936
... Just set the output encoding to something other than UTF-8. It has to be something your Java VM understands; US-ASCII will always work. -- John Cowan...
John Cowan
johnwcowan
Online Now Send Email
Sep 3, 2007
7:22 pm
937
My input documents are russian, chinese and whatnot, so I fear US-ASCII will not do me much good? Or am I wrong? First thing I do when I download the documents...
Jaran Nilsen
jaran.nilsen@...
Send Email
Sep 3, 2007
7:27 pm
938
... No, it's the *output* encoding that controls whether character references are generated. TagSoup doesn't know which encodings can support which characters...
John Cowan
johnwcowan
Online Now Send Email
Sep 4, 2007
2:26 am
939
Ok, I will see if I can solve it somehow. Thanks a lot for your input :) Jaran...
Jaran Nilsen
jaran.nilsen@...
Send Email
Sep 4, 2007
6:16 am
Messages 847 - 939 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help