I need some help from any one of the friends.i am doing a project in screen scraping i am able to scrap the screen and then parse it. but i am unable to...
Hi John, ... In my opinion, either is preferable to breaking the form in half :-) That's certainly not what a typical browser would do, and it breaks the...
Hi, We’ve come across a problem with a greater than sign in quoted attributes. If another attribute follows, this attribute is being excluded from the rest...
sytse@...
Sep 3, 2004 4:10 pm
142
... This is another "damned if I do, damned if I don't" situation. There's a *lot* of HTML out there with unterminated quotes, in which case the attribute...
Hmm, it seems bad if tagsoup mis-parses *good* html. I just did a bit of experimenting and both Mozilla and IE seem to parse this situation strictly - i.e....
... This argument is compelling. I'll change this in the next release. Until then, you can patch the behavior out by removing lines 52 and 153, respectively: ...
... It probably wasn't a good idea. To the extent possible you should figure out what sanity checks IE does on its input. However IE interprets the input is...
Hi, I'm having a little trouble with TagSoup parsing JavaScript. The thing is for instance, that if there is a <script>document.write('<textarea'>) that...
sytse@...
Sep 7, 2004 4:37 pm
147
... I can't reproduce this problem. What version of TagSoup are you using? -- John Cowan <jcowan@...> http://www.reutershealth.com "But no...
... Yes, I'm sorry. I was still using the previous version of TagSoup because I made some changes to the previous one. But the problem still stands, the test...
sytse@...
Sep 8, 2004 4:32 pm
150
... This problem results from a bad fix I made back in 0.9.4. The next release will restore the 0.9.3 code. -- Real FORTRAN programmers can program FORTRAN...
Version 0.10.2 fixes some long-standing bugs in the areas of entity references within attribute values, well-formed names for elements and processing...
Very nice! Did you still have any ideas on the form problem, are you thinking on changing that in a coming release? Cheers, Sytse ... This mail sent through...
sytse@...
Sep 9, 2004 10:39 am
153
... Thank you. ... I'm still thinking about it, but I'm not ready to commit to adding it to the TODO list. -- "Kill Gorgûn! Kill orc-folk! John...
I've put together a (new, improved - not much) Java version [1] of the naive generic XML cleaner I did for Python. As Norm Walsh put it : " I have new...
Hi, We've had a look at the form problem, basically focussing on the following problem: When you have an input like ...
sytse@...
Sep 10, 2004 5:27 pm
156
I'm have the following: "\rsome\r\n\tfreeform\ncontent" It's coming out as: "\rsome\r\n freeform\ncontent" at the other end. Notice the TAB character was...
... Not only is that not to be expected. It is to be expected that the \r and \r\n would both be changed to to \n (assuming those are not literal backslashes)....
Elliotte Harold
elharo@...
Sep 17, 2004 12:07 pm
158
Hello, I downloaded the jar file and run it with java -Dfiles=true -jar tagsoup-0.10.2.jar test where "test" is the html file and got following error: ...
... What version of Java ? You need Java 1.4 or better to have the org.sax and javax.xml packages automatically available. Else, you'll have to provide an...
The only code example I've found for calling tagsoup from a Java program is here: http://www.hackdiary.com/archives/000041.html I can successfully process an...
Hi, I have a little problem with a document.write in TagSoup. With the following input: document.write('</scr + ipt>'); in the output </scr + ipt> will be...
sytse@...
Sep 28, 2004 2:56 pm
163
... I use Tagsoup in a number of little scripts. Here's how I do it with JDOM: import org.ccil.cowan.tagsoup.Parser; import org.jdom.input.SAXHandler; import...
Brian Lalor
blalor-k-yahoo.f6bdbf...
Oct 1, 2004 12:08 am
164
TagSoup 1.0rc1 (release candidate 1 of version 1.0) has been released. Please hammer on this and let me know what you find. Thanks. It's in the usual place:...
Nux has matured, and this is to announce the availability of the nux-1.0a2 release. Nux (http://dsd.lbl.gov/nux) is a small, straightforward, and surprisingly...
Wolfgang Hoschek
whoschek@...
Oct 12, 2004 6:50 pm
166
Wolfgang, As I warned you earlier, there is a serious license conflict here. This product includes code directly copied and pasted from XOM, which is fine, but...
Elliotte Harold
elharo@...
Oct 12, 2004 10:30 pm
167
... One API comment. The DocumentWrapper getUnparsedEntity entity method is declared thusly: public String[] getUnparsedEntity(String name) You say this...
Elliotte Harold
elharo@...
Oct 12, 2004 10:32 pm
168
Eliotte, Lots of fuzz about a little patch adding one constructor argument to XSLTransform! As you are well aware of, the XOM code you refer to retains your ...
Wolfgang Hoschek
whoschek@...
Oct 12, 2004 11:29 pm
169
... No, I indicated that I would consider it after I get XML 1.0 out the door. I made no promises that I would do it. Possibly it will get in. Possibly it...
Elliotte Harold
elharo@...
Oct 13, 2004 12:33 am
170
I've uploaded nux-1.0a3 to address Elliotte's concerns wrt. XOM: Changelog: • Separated the patched class nu.xom.xslt.XSLTransform (LGPL licensed, copyright...