Hi, I am pretty new to Heritrix. Once awhile I get the error: Heritrix(-63)-Prerequisite unschedulable failure, what does it mean? I looked up the user manual...
This must be because of 'bad' seeds - which hosts are you trying to crawl? best -- Bjarne Andersen Daily Manager - netarchive.dk State & University Library ...
Hi. Attached to this mail your will find a newsletter from the netarchive.dk project - march 2007 The newsletter gives updates and news on both a collecting...
... I am sure someone can give a reason for why this is needed but try adding this decide rule to the end of your chain: PrerequisiteAcceptDecideRule It worked...
You can't crawl anything if not allowing DNS-lookup and fetching of robots.txt. This is exactly what PrerequisiteAcceptDecideRule does. best -- Bjarne Andersen...
Hi Mike, You are right. It works. I believe that the reason is that PrerequisiteAcceptDecideRule accepts all URIs the crawler has discovered *and* considered...
... RejectDecideRule REJECTs everything -- it is used to establish the default decision. It is then up to later rules to ACCEPT what you want and what is...
Hi, It's that time again. We're going to try 2.5B (maybe more) this time. We've upgrade our bandwidth to 250Mbps, all year round. So that means we're going to...
Bert & others - An update on this issue: The original attempted fix (of March 23) created other problems, but an alternate fix was applied the 26th that...
Hi, I am trying to Monitor a Heritrix Instance using JMX through a Firewall. I am sure people have faced trouble with this before. As far as I understand it...
Hi Gordon, Thanks for your detailed explaination. I can see your point that adding PrerequisiteAcceptDecideRule never hurts because it doesn't have any...
Hi Gordon, Is there a reason why Heritrix only fetches 1 URL at a time besides establishing multiple connections to the same host might be considered as DoS...
I have heretrix-1.10.2 running on a dual core Linux box with 2.8Ghz cpu's and 8G memory. Heretrix is often running into an Out of Memory error. I dont recall...
I am writing a custom Processor (post-processor) for Heritrix and was implementing the innerProcess method to capture meta data on the URI. One of the things...
... Excellent! ... I am always partial to the latest releases, though you may want to make the determination based on your own reading of the changes/issues....
... Without specifically reproducing your error, if the problem is that the DNS fetch is being ruled out-of-scope, I believe the reason is that a 'dns:' URI is...
... Were you using the exact same JVM (esp. heap) and Heritrix (esp. Processors and UriUniqFilter) options in 1.8? How long does it take to OOME? A problematic...
This may be indicative of a bug, but I suspect even if so it is triggered by your atypical use of DecideRules on the LinksScoper. The LinksScoper doesn't...
Discover How To Get Paid To Use Your Digital Camera! Get Ready to Make Money Online With Your Digital Camera! All You Need is a Digital Camera & Internet...
Yes, I was using the same JVM with 1.8. The heap I had specified was 1G. Now with 1.10.2, I had the heap set to 1G to begin with. After I encountered the OOME,...
Arrhh, no wonder most of the time, most of my toe threads are sitting there and doing nothing. I remember that I read a member's post here, and he was using...
Hi Gordon, Yes, I can move the decide rules in LinkerScope to the main crawl Scope. My original thought was trying to discard the link in the earlist stage -...