If you set MaxDocs: 300 (for example) in the scenario file, and then
go on and spider a site, pavuk correctly stops downloading documents
once 300 are found.
However, there are two issue as I see it:
1. If you have allowed only certain kinds of data with
AllowedMIMETypes, pavuk will still count blocked files when
incrementing up to 300. So in effect, if you spider a site where the
first 299 files happen to be images (and those are disallowed) you
will end up with only 1 file.
2. If Pavuk decides there exists 1300 docs in total on a site, it
will, after reaching the 300 limit specified with MaxDocs, continue to
parse each and every document, just to tell you that the max limit is
reached for each. It would be very desireable if it would just exit
once the limit is reached.
Is there any way to change these behaviours, or should they be filed
as bugs? If so, where should I file them? The sourceforge page
doesn't seem to be very active.
Thanks
Alec