Hello group!
I have faced with strange behaviour of pavuk and I think that this is a bug.
I use pavuk 0.9.31
lomov@theor:~$ pavuk --version
pavuk 0.9.31 2005-01-14T13:56
Optional features available :
- Debug mode
- GNU gettext internationalization of messages
- flock() document locking
- HTTP and FTP over SSL
- SSL layer implemneted with OpenSSL library
- optional regex patterns in -fnrules and -*rpattern options
- POSIX regexp
- support for detecting whether pavuk is running as background job
- multithreading support
- NTLM authorization support
- IPv6 support
I have downloaded site http://www.linuxtopia.org and particularly
Perl_Programming subdirectory. This subdirectory contains file
pickingUpPerl_[0-9]*.html. But when I tried to see local copy of these
files they do not displayed correctly. I opened some files in text
editor and found that begining with some line the markup symbols (<,>)
are removed.
This were strange for me because I have seen this file in Internet and
it have been fine.
Later I download this site with the help of wget and all files are fine.
I suppose that pavuk incorrently render the following markup (borrowed
from file pickingUpPerl_20.html)
...
(line 81) </td>
(line 82) <td style="bodycolwidth"; vertical-align: top>
(line 83)
(line 84)
(line 85)
(line 86) <BODY LANG="" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF"
VLINK="#800080" ALINK="#FF0000">
...
(This file was downloaded with the help of wget)
Corresponding lines of the file downloaded by pavuk
...
(line 81) </td>
(line 82) <td style="bodycolwidth"; vertical-align: top
(line 83)
(line 84)
(line 85)
(line 86) BODY LANG="" BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF"
VLINK="#800080" ALINK="#FF0000"
...
It is obvious that there is the syntax error in the original file
(site's file). For some reason pavuk strip all following markup symbols.
P.S. Sorry, if my English is poor.
---
WBR, Vladimir Lomov