Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 3857 - 3886 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
3857
A DecideRule that shuttles pages to another running process asking whether to continue processing or not would make for a nice Heritrix contribution. What were...
Michael Stack
stackarchiveorg
Offline Send Email
Mar 1, 2007
6:54 pm
3858
Hi, Anybody has this heritrix.war for heritrix 1.10 ? Please mail me as the site is down for 2 days . when i try to deploy admin.war with all necessary jars in...
thiru_sundaram
Offline Send Email
Mar 2, 2007
6:10 am
3859
Hmm.. If i explicitly removes the Extension lists from the MANIFEST.MF in heritrix.jar it works fine. Is there any nice way of handling this without modifying...
thiru_sundaram
Offline Send Email
Mar 2, 2007
9:59 am
3860
Heritrix-as-a-WAR uses the containers' authentication mechanism. If you haven't already, set up an 'admin' role with login/password for your container -- for...
Michael Stack
stack@...
Send Email
Mar 2, 2007
4:51 pm
3861
I was wondering if anyone was getting OutOfMemoryErrors thrown when checkpointing. On my system (8GB memory, 1GB allocated to the heap) I have checkpointing...
mikbrt
Offline Send Email
Mar 6, 2007
4:02 am
3862
... It would be useful to see the lines of heritrix_out.log and progress_statistics.log before the point where the error occurred. How large is this crawl?...
Michael Magin
magin@...
Send Email
Mar 6, 2007
6:20 pm
3863
What version of Heritrix are you running? If anything earlier than 1.10.2, I recommend upgrading: there's a fix to an issue with an included third-party...
Gordon Mohr
gojomo
Online Now Send Email
Mar 6, 2007
7:49 pm
3864
I'm back looking at Heritrix and how I would implement an RPC decide rule, but I need to rethink how to go about this. I need Heritrix to make an RPC call,...
Kaleb
kalebmurphy
Offline Send Email
Mar 7, 2007
7:38 pm
3865
It looks like I've answered my own question. I'll be writing a decide rule which makes an RPC call. I'm not going to serialize the CrawlURI as I plan on...
Kaleb
kalebmurphy
Offline Send Email
Mar 8, 2007
4:44 am
3866
I need to add the Apache XMLRPC jars to the Heritrix build, but I have no idea how to add these to Maven. Could someone point me in the right direction or at...
Kaleb
kalebmurphy
Offline Send Email
Mar 8, 2007
4:54 pm
3867
See the dependencies section in the project.xml. If the xmlrpc lib is available up in the maven1 ibiblio repository, maven should just fetch it for you (You...
Michael Stack
stackarchiveorg
Offline Send Email
Mar 8, 2007
5:17 pm
3868
First, thank you so much for the response. That was a ton of help. Just one more question. If I already have the jar files and want to use my local copies,...
Kaleb
kalebmurphy
Offline Send Email
Mar 8, 2007
5:42 pm
3869
... You need to add in both places. Make sure that the ID in the project.xml dependency section matches the ID suffix in your 'maven.jar.ID' entry in...
Michael Stack
stack@...
Send Email
Mar 8, 2007
6:04 pm
3870
Thanks for the continued help. Building Perl projects is a lot different than this. I've managed to build Heritrix with the new jars in the project files and...
Kaleb
kalebmurphy
Offline Send Email
Mar 8, 2007
10:43 pm
3871
Never mind. Two seconds after I posted this, I realized that the runtime wasn't finding the libraries because Maven wasn't magically copying them into the...
Kaleb
kalebmurphy
Offline Send Email
Mar 8, 2007
10:51 pm
3872
... No worries. St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Mar 8, 2007
11:55 pm
3873
Could someone point me at some documentation or give me a hint on what I'm doing wrong with this DecideRule I've created? It has the constructor which takes...
Kaleb
kalebmurphy
Offline Send Email
Mar 9, 2007
12:27 am
3874
... Which DR are you subclassing? If you override #decisionFor, is it called? Your rule is one of a set of rules in a DecidingScope (You say CrawlScope...
Michael Stack
stack@...
Send Email
Mar 9, 2007
4:33 pm
3875
Woohooo, it works! Thank you so much for the help, St.Ack. I was subclassing DecideRule which doesn't have an #evaluate to override. I mis-read the code....
Kaleb
kalebmurphy
Offline Send Email
Mar 9, 2007
6:47 pm
3876
I am trying to build heritrix from source(1.10.2) using maven 1.0.2, I am using jdk 1.6 and I exported the following variables on my machine: export...
Ahmed Ghozia
ghouzia
Offline Send Email
Mar 11, 2007
7:29 pm
3877
Ahmed Ghozia, St.Ack touches on this issue here (The source is missing files): http://tech.groups.yahoo.com/group/archive-crawler/message/3850 Follow the...
Kaleb
kalebmurphy
Offline Send Email
Mar 12, 2007
4:48 am
3878
Thanks kaleb, I made a round to solve this problem. I used heritrix src - 1.10.0 instead of 1.10.2 and it worked fine Kaleb <lostokies@...> wrote:...
ahmed ghouzia
ghouzia
Offline Send Email
Mar 12, 2007
12:07 pm
3879
A release candidate build for the upcoming Heritrix 1.12.0 release is now available from the Heritrix build box, ...
Gordon Mohr
gojomo
Online Now Send Email
Mar 12, 2007
11:00 pm
3880
... I heard that you consider to move to Maven2, is this scheduled sometime soon? Cheers, -- Laurian Gridinoc, purl.org/net/laur...
Laurian Gridinoc
lauriangridinoc
Offline Send Email
Mar 13, 2007
8:29 am
3881
Dear Gordon. I recently commited a bug #1675749 ARCWriter cannot handle records larger than 2 GB ...
Søren Vejrup Carlsen
svc400
Offline Send Email
Mar 13, 2007
10:19 am
3882
Hi Gordon, thanks for the great work and congrats to the release candidate. But... I have two issues: 1) How to commit bug reports to the new system? Do I have...
pandae667
Offline Send Email
Mar 13, 2007
12:15 pm
3883
Hi Gordon, consider the last part of my post void - seems like for some reason I've been testing with a broken WARC file. But still I'm unable to use the v10...
pandae667
Offline Send Email
Mar 13, 2007
2:04 pm
3884
You need to be registered to submit a bug Olaf. Go here Olaf: http://webteam.archive.org/jira/secure/Dashboard.jspa. Let us know if it doesn't work for you. ...
Michael Stack
stackarchiveorg
Offline Send Email
Mar 13, 2007
4:09 pm
3885
Could someone familiar with the Heritrix Arch help me out? From the CrawlURI, I'm trying to find the "depth" of the current URI from the seed URI. If the...
Kaleb
kalebmurphy
Offline Send Email
Mar 13, 2007
4:46 pm
3886
... I've been practicing the migration over on the archive-access sister project and its been taking up a bunch of time. The move from m1 to m2, in effect,...
Michael Stack
stackarchiveorg
Offline Send Email
Mar 13, 2007
4:48 pm
Messages 3857 - 3886 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help