Search the web
Sign In
New User? Sign Up
pavuk · Pavuk Webgrabber Mailing List
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
pavuk tries to download from "file:///<various-paths>"   Message List  
Reply | Forward Message #741 of 988 |

When there is no existing copy of the downloaded files on my machine,
I note that Pavuk's 3rd (or so) download is an attempt to
download the current-working-directory as a URL from the remote machine.

Once a copy of the remote files exist on my local machine, then Pavuk
tries to download _._.HTML from the remote site.

In either case, the remote site doesn't have the requested item,
AND it will probably generate messages in the remote-sites's logs,
to their dismay.

Is there a command-line flag to stop this behavior?
Is it a bug?

If anyone has insight into this, I'd appreciate a private e-mail.
Thanks very much!

Please see the small snippets below.
Bob

WITH NO LOCAL FILES (see download item #3, ie: file:///home/enger ):
[enger@localhost tmp]$ pwd
/tmp
[enger@localhost tmp]$ time pavuk -cdir /tmp/work -singlepage
-norobots -nthreads 1 http://www.yahoo.com/
URL[ 1]: 1(0) of 1 http://www.yahoo.com/
download: OK
URL[ 1]: 2(0) of 16
http://us.i1.yimg.com/us.yimg.com/i/ww/m6v8y.gif
download: OK
URL[ 1]: 3(0) of 16 file:///tmp
download: ERROR: opening file
URL[ 1]: 4(1) of 16
http://us.i1.yimg.com/us.yimg.com/a/ho/hot_jobs/old54x16tran_tm.gif
download: OK
URL[ 1]: 5(1) of 16
http://us.i1.yimg.com/us.yimg.com/i/mntl/spo/03q3/hdr_nfl.gif
download: OK
[...]


NOW THAT THE LOCAL DIRECTORY HAS SOME CONTENTS, NOTE CHANGE IN #3:
[enger@localhost tmp]$ time pavuk -cdir /tmp/work -singlepage
-norobots -nthreads 1 http://www.yahoo.com/
URL[ 1]: 1(0) of 1 http://www.yahoo.com/
File redirect
download: OK
URL[ 1]: 2(0) of 16
http://us.i1.yimg.com/us.yimg.com/i/ww/m6v8y.gif
File redirect
download: OK
URL[ 1]: 3(0) of 16 http://www.yahoo.com/_._.html
download: ERROR: HTTP document not found
URL[ 1]: 4(1) of 16
http://us.i1.yimg.com/us.yimg.com/a/ho/hot_jobs/old54x16tran_tm.gif
File redirect
download: OK
URL[ 1]: 5(1) of 16
http://us.i1.yimg.com/us.yimg.com/i/mntl/spo/03q3/hdr_nfl.gif
File redirect
download: OK
URL[ 1]: 6(1) of 16
http://us.i1.yimg.com/us.yimg.com/i/mntl/spo/03q4/favre_hug.jpg
[...]






Sun Dec 28, 2003 7:45 pm

engerenger
Offline Offline
Send Email Send Email

Forward
Message #741 of 988 |
Expand Messages Author Sort by Date

When there is no existing copy of the downloaded files on my machine, I note that Pavuk's 3rd (or so) download is an attempt to download the...
engerenger
Offline Send Email
Dec 30, 2003
8:59 pm

it's a bug. I fixed it at Netli before they fired me. I've got two sets of updates to merge with the mainline tree. I'm sorry I haven't been very responsive...
Martin Fouts
lists@...
Send Email
Jan 4, 2004
3:13 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help