Search the web
Sign In
New User? Sign Up
tagsoup-friends · Friends of TagSoup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 359 - 390 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
359
Hello all, I'm just getting started using the tagsoup library to parse some content. What I'm discovering is that for many HTML documents I get a parsing error...
Rob Konigsberg
rikonigsberg
Offline Send Email
Feb 8, 2006
7:45 am
360
... That can't be a TagSoup error, as TagSoup never reports any errors (except low-level IOExceptions when a file cannot be read or something of the sort)....
John Cowan
johnwcowan
Online Now Send Email
Feb 8, 2006
8:12 am
361
John, thanks for the prompt reply. I may have been mistaken. I'll go back to the drawing board and see what I find. ... -- Robert Konigsberg ...
Rob Konigsberg
rikonigsberg
Offline Send Email
Feb 8, 2006
8:29 am
362
Last night in Albany at the CDJDN meeting someone complained about JavaScript errors in my XOM pages. That surprised me because I didn't think I had any...
Elliotte Harold
elharo@...
Send Email
Feb 10, 2006
2:34 am
363
... Because HTML is an SGML application, not an XML one. The script and style elements are of type CDATA, which means that after the start-tag, the only...
John Cowan
johnwcowan
Online Now Send Email
Feb 10, 2006
4:38 am
364
... The problem is I do want well-formed XHTML output. i.e. I do want empty elements to be closed (or use an empty-element tag). Is there any way to get both? ...
Elliotte Harold
elharo@...
Send Email
Feb 10, 2006
9:49 am
365
... That would be self-contradictory. The "comment" is not really an HTML comment, but if I left the < unescaped it would become an XML comment and disappear...
John Cowan
johnwcowan
Online Now Send Email
Feb 10, 2006
1:06 pm
366
John, Unsurprisingly, you were right. Thanks again for the quick response. Robert ... -- Robert Konigsberg konigsberg@... "Uh Oh. This does not look good...
Rob Konigsberg
rikonigsberg
Offline Send Email
Feb 10, 2006
3:18 pm
367
Hi, I noticed that the tagsoup parser chops any attribute value which is longer then 1980 characters. I think the HTML spec. mentions 1024 characters as the...
amir_langer
Offline Send Email
Feb 14, 2006
3:21 pm
368
I'm new to TagSoup. I really like the command line interface and it's speed / features. I didn't notice much documentation for using the API so I'm hoping you ...
Ray Grieselhuber
rgrieselhuber
Offline Send Email
Feb 15, 2006
1:01 am
369
... [snip] ... Currently no, but you can change the size of the array theOutputBuffer in src/templates/org/ccil/cowan/tagsoup/HTMLScanner.java from 2000 to...
John Cowan
johnwcowan
Online Now Send Email
Feb 15, 2006
2:22 am
370
... This is the part that TagSoup provides; it is a SAX parser. ... You can use XOM or any other tree model you like; all of them support external SAX parsers....
John Cowan
johnwcowan
Online Now Send Email
Feb 15, 2006
5:34 am
371
Ok. So, unless I missed it on the website, I didn't see something that documents using the API to import the file and parse it. Is there documentation, or can...
Ray Grieselhuber
rgrieselhuber
Offline Send Email
Feb 15, 2006
5:46 am
372
... If you look at CommandLine.java, you will see how it's done there. -- John Cowan cowan@... www.ccil.org/~cowan www.ap.org If I have seen farther...
John Cowan
johnwcowan
Online Now Send Email
Feb 15, 2006
6:10 am
373
Thanks for the speedy reply....
amir_langer
Offline Send Email
Feb 15, 2006
10:34 am
376
Hi, I have just switched from Tidy to Tagsoup after realizing its inherent bugs and heavy weightness. Following is what I need to do: 1) I have dirty text...
Anurag Singh
as_vns_007
Offline Send Email
Feb 28, 2006
3:20 pm
377
... There are two approaches. You can write a program which processes the SAX events directly and retains the ones you want while discarding the ones you...
John Cowan
johnwcowan
Online Now Send Email
Feb 28, 2006
4:01 pm
378
Can u plz help me that where html entities are trannsformed in the source code. Actually i don't want to tranform xml entities(&amp; , &lt; etc) in the...
Anurag Singh
as_vns_007
Offline Send Email
Mar 1, 2006
3:49 pm
379
... All SAX parsers convert entities to characters when returning values to the caller. It's up to the caller, if it intends to produce output as XML, to...
John Cowan
johnwcowan
Online Now Send Email
Mar 1, 2006
4:00 pm
380
Thanks John. I m able to do it with XMLWriter or Regex . But when i m removing "<entity name='amp' codepoint='0026'/>" from definitions.html.tssl then it...
Anurag Singh
as_vns_007
Offline Send Email
Mar 2, 2006
6:23 am
381
... When TagSoup sees an entity that's not in its tables, it returns the entity reference as text (what else is there to do, given the Keep On Truckin'...
John Cowan
johnwcowan
Online Now Send Email
Mar 2, 2006
1:16 pm
382
Hi, I'm an experienced Java programmer but I've never written a SAX application from scratch and I'm not sure how it should all hang together. I'm trying to...
Alex Worden
alexworden
Offline Send Email
Mar 2, 2006
7:32 pm
383
... Spiders aren't exactly trivial code, but thanks for thinking of TagSoup. You might want to reuse someone else's spider and then use TagSoup to postprocess...
John Cowan
johnwcowan
Online Now Send Email
Mar 2, 2006
7:41 pm
384
If you run tag soup on "&c" you get out "&amp;c&#65535;" If you run it on "<b>&c</b>" or "&c " you get out "<b>&amp;c</b>" Leigh....
Klotz, Leigh
leighklotz
Offline Send Email
Mar 3, 2006
12:45 am
385
... It's mishandling EOF somewhere. I hope to be able to do some more work on TagSoup soon. I've recently changed jobs, which has been stressful. Let me take...
John Cowan
johnwcowan
Online Now Send Email
Mar 3, 2006
3:07 am
386
hi, in order to evaluate tagsoup, I downloaded it and tried it with "java -jar tagsoup-1.0rc3 --files --html ..." on some html pages saved with 'save page...
Pierre Bru
pbru_2001
Offline Send Email
Mar 4, 2006
10:32 pm
387
... Can you send me the particular HTML page that failed? -- Even a refrigerator can conform to the XML John Cowan Infoset, as long as it has a door...
John Cowan
johnwcowan
Online Now Send Email
Mar 5, 2006
7:19 am
388
hi, I wanted to rebuild tagsoup to be able to watch it work in the debugger, but the compiler complains about HTMLSchema and HTMLScanner. I looked in the...
Pierre Bru
pbru_2001
Offline Send Email
Mar 5, 2006
5:32 pm
389
... Install Ant and say "ant" from the root TagSoup directory; that will drop the current TagSoup .jar in the dist/lib directory. Make sure you are building...
John Cowan
johnwcowan
Online Now Send Email
Mar 5, 2006
6:31 pm
390
... well... I'm using eclipse 3.1 I suppose there is an ant build file somewehe. i will try to figure out how to setup eclipse to use it. thanx. Pierre....
Pierre Bru
pbru_2001
Offline Send Email
Mar 5, 2006
7:21 pm
Messages 359 - 390 of 1386   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help