Search the web
Sign In
New User? Sign Up
pavuk · Pavuk Webgrabber Mailing List
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
MaxDocs issue   Message List  
Reply | Forward Message #748 of 988 |
If you set MaxDocs: 300 (for example) in the scenario file, and then
go on and spider a site, pavuk correctly stops downloading documents
once 300 are found.
However, there are two issue as I see it:

1. If you have allowed only certain kinds of data with
AllowedMIMETypes, pavuk will still count blocked files when
incrementing up to 300. So in effect, if you spider a site where the
first 299 files happen to be images (and those are disallowed) you
will end up with only 1 file.

2. If Pavuk decides there exists 1300 docs in total on a site, it
will, after reaching the 300 limit specified with MaxDocs, continue to
parse each and every document, just to tell you that the max limit is
reached for each. It would be very desireable if it would just exit
once the limit is reached.

Is there any way to change these behaviours, or should they be filed
as bugs? If so, where should I file them? The sourceforge page
doesn't seem to be very active.

Thanks
Alec





Wed Apr 21, 2004 5:51 pm

maltepalte2000
Offline Offline
Send Email Send Email

Forward
Message #748 of 988 |
Expand Messages Author Sort by Date

If you set MaxDocs: 300 (for example) in the scenario file, and then go on and spider a site, pavuk correctly stops downloading documents once 300 are found. ...
maltepalte2000
Offline Send Email
Apr 22, 2004
1:45 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help