Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 3997 - 4026 of 6140   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
3997
Hi, I am pretty new to Heritrix. Once awhile I get the error: Heritrix(-63)-Prerequisite unschedulable failure, what does it mean? I looked up the user manual...
Hei
structurechart
Offline Send Email
Apr 2, 2007
6:42 am
3998
... http://crawler.archive.org/xref/org/archive/crawler/datamodel/FetchStatusCodes.html#38 ... Oops forgot to provide more info about my setting: ...
Hei
structurechart
Offline Send Email
Apr 2, 2007
6:56 am
3999
This must be because of 'bad' seeds - which hosts are you trying to crawl? best -- Bjarne Andersen Daily Manager - netarchive.dk State & University Library ...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Apr 2, 2007
7:33 am
4000
Hi. Attached to this mail your will find a newsletter from the netarchive.dk project - march 2007 The newsletter gives updates and news on both a collecting...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Apr 2, 2007
9:23 am
4001
... I am sure someone can give a reason for why this is needed but try adding this decide rule to the end of your chain: PrerequisiteAcceptDecideRule It worked...
mbarlotta
Offline Send Email
Apr 2, 2007
1:22 pm
4002
You can't crawl anything if not allowing DNS-lookup and fetching of robots.txt. This is exactly what PrerequisiteAcceptDecideRule does. best -- Bjarne Andersen...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Apr 2, 2007
1:50 pm
4003
Hi Mike, You are right. It works. I believe that the reason is that PrerequisiteAcceptDecideRule accepts all URIs the crawler has discovered *and* considered...
Hei
structurechart
Offline Send Email
Apr 2, 2007
5:45 pm
4004
Hi Bjarne, Thanks for your time. Are you saying that RejectDecideRule actually dis-allow DNS-lookup and fetching robots.txt? If I put...
Hei Chan
structurechart
Offline Send Email
Apr 2, 2007
5:49 pm
4005
... RejectDecideRule REJECTs everything -- it is used to establish the default decision. It is then up to later rules to ACCEPT what you want and what is...
Gordon Mohr
gojomo
Online Now Send Email
Apr 2, 2007
8:57 pm
4006
Hi, It's that time again. We're going to try 2.5B (maybe more) this time. We've upgrade our bandwidth to 250Mbps, all year round. So that means we're going to...
joehung302
Offline Send Email
Apr 2, 2007
9:06 pm
4007
Bert & others - An update on this issue: The original attempted fix (of March 23) created other problems, but an alternate fix was applied the 26th that...
Gordon Mohr
gojomo
Online Now Send Email
Apr 3, 2007
12:51 am
4008
Hi, I am trying to Monitor a Heritrix Instance using JMX through a Firewall. I am sure people have faced trouble with this before. As far as I understand it...
Anmol Bhasin
molzbh
Online Now Send Email
Apr 3, 2007
1:09 am
4009
Hi Gordon, Thanks for your detailed explaination. I can see your point that adding PrerequisiteAcceptDecideRule never hurts because it doesn't have any...
Hei Chan
structurechart
Offline Send Email
Apr 3, 2007
3:57 am
4010
Hi, I am trying to understand the processing steps/chain; however, the most important graph is broken: ...
Hei
structurechart
Offline Send Email
Apr 3, 2007
6:16 am
4011
Hi Gordon, Is there a reason why Heritrix only fetches 1 URL at a time besides establishing multiple connections to the same host might be considered as DoS...
Hei
structurechart
Offline Send Email
Apr 3, 2007
6:53 am
4012
I just randomly setup a job and ran it for like 20 seconds. Then I started getting java.lang.ClassCastException. Here is the runtime-errors.log: ...
Hei
structurechart
Offline Send Email
Apr 3, 2007
7:40 am
4013
I have heretrix-1.10.2 running on a dual core Linux box with 2.8Ghz cpu's and 8G memory. Heretrix is often running into an Out of Memory error. I dont recall...
anitabidari
Offline Send Email
Apr 3, 2007
8:55 pm
4014
I am writing a custom Processor (post-processor) for Heritrix and was implementing the innerProcess method to capture meta data on the URI. One of the things...
mbarlotta
Offline Send Email
Apr 3, 2007
9:01 pm
4015
Should be fixed now. Thanks for pointing out the broken link. St.Ack ... (http://crawler.archive.org/articles/user_manual/config.html#processors)....
stackarchiveorg
Offline Send Email
Apr 3, 2007
9:30 pm
4016
JMX over RMI and firewalls do not get along too well (Here is an umbrella posting on the issue: ...
Michael Stack
stackarchiveorg
Offline Send Email
Apr 3, 2007
9:50 pm
4017
... Excellent! ... I am always partial to the latest releases, though you may want to make the determination based on your own reading of the changes/issues....
Gordon Mohr
gojomo
Online Now Send Email
Apr 3, 2007
10:24 pm
4018
... Without specifically reproducing your error, if the problem is that the DNS fetch is being ruled out-of-scope, I believe the reason is that a 'dns:' URI is...
Gordon Mohr
gojomo
Online Now Send Email
Apr 3, 2007
10:31 pm
4019
... That's essentially the reason -- the assumption that more than one connection at a time is impolite is deeply built into the Heritrix queueing...
Gordon Mohr
gojomo
Online Now Send Email
Apr 3, 2007
10:33 pm
4020
... Were you using the exact same JVM (esp. heap) and Heritrix (esp. Processors and UriUniqFilter) options in 1.8? How long does it take to OOME? A problematic...
Gordon Mohr
gojomo
Online Now Send Email
Apr 3, 2007
10:44 pm
4021
This may be indicative of a bug, but I suspect even if so it is triggered by your atypical use of DecideRules on the LinksScoper. The LinksScoper doesn't...
Gordon Mohr
gojomo
Online Now Send Email
Apr 3, 2007
10:53 pm
4022
Discover How To Get Paid To Use Your Digital Camera! Get Ready to Make Money Online With Your Digital Camera! All You Need is a Digital Camera & Internet...
Online Jobs
mondey_5000
Offline Send Email
Apr 4, 2007
12:03 am
4023
Yes, I was using the same JVM with 1.8. The heap I had specified was 1G. Now with 1.10.2, I had the heap set to 1G to begin with. After I encountered the OOME,...
anitabidari
Offline Send Email
Apr 4, 2007
1:20 am
4024
Arrhh, no wonder most of the time, most of my toe threads are sitting there and doing nothing. I remember that I read a member's post here, and he was using...
Hei
structurechart
Offline Send Email
Apr 4, 2007
5:48 am
4025
Yea, I can see it now. Thanks. Cheers, Hei ... (http://crawler.archive.org/articles/user_manual/config.html#processors)....
Hei
structurechart
Offline Send Email
Apr 4, 2007
6:22 am
4026
Hi Gordon, Yes, I can move the decide rules in LinkerScope to the main crawl Scope. My original thought was trying to discard the link in the earlist stage -...
Hei
structurechart
Offline Send Email
Apr 4, 2007
7:47 am
Messages 3997 - 4026 of 6140   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help