Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 302 - 331 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
302
Hi ! I updated my HERITRIX installation from CVS - and now I can't crawl at all - I get alerts on every try: Could someone tell me whether the CVS version is...
bja@...
bjarne_dk2000
Offline Send Email
Apr 6, 2004
12:36 pm
303
Hello Bjarne. I just tried a build from HEAD and all seems to work fine. Perhaps your order file is from a previous version and the newer code has trouble ...
Michael Stack
stack@...
Send Email
Apr 6, 2004
5:08 pm
304
The alerts all came up in the UI - when configuring HERITRIX from inside the UI (using the Simple Profile) I returned to the official release 0.6.0 - it works...
bja@...
bjarne_dk2000
Offline Send Email
Apr 7, 2004
7:41 am
305
Hello ! Does HERITRIX handle cookies? - in the UI there are two text-fiels for save and load cookie-file ! When the crawler runs - does it save cookies...
bja@...
bjarne_dk2000
Offline Send Email
Apr 7, 2004
7:45 am
306
Hi, I collected a nice test archive of about 100000 docs with heritrix 0.6.0 I think it went well (I didn't yet try out very baaad web sites;) Now I try to...
kaisa_kaunonen
Offline Send Email
Apr 7, 2004
11:13 am
307
... Yes it does. Handling of cookies is done by default. The load cookies option allows an operator to pre-load existing cookies file (in the Netscape's ...
Igor Ranitovic
iranitovic
Offline Send Email
Apr 7, 2004
6:14 pm
308
Hello every one, Im trying to use heritrix on a windows(!) plattform. Whenever i submit a job via the web interface i get an error - here is the log (alert)...
thomasschmegg
Offline Send Email
Apr 13, 2004
12:23 pm
309
Thomas: Which version of heritrix? Is it a release or cvs HEAD? Thanks, St.Ack...
stack
stack@...
Send Email
Apr 13, 2004
3:21 pm
310
... oh, sorry for the lack of that information. i'm using version 0.6.0 which i have downloaded from the heritrix homepage. ... windows?...
thomasschmegg
Offline Send Email
Apr 14, 2004
11:27 am
311
Hi ! We are testing HERITRIX in connection with harvesting specially selected websites - when harvesting only one website (on only one host / domain) the...
bja@...
bjarne_dk2000
Offline Send Email
Apr 14, 2004
7:26 pm
312
... A first step would be to eliminate the politeness delay between requests to a single server. To do this, in the 'frontier' section of the crawl job...
Gordon Mohr
gojomo
Offline Send Email
Apr 14, 2004
7:55 pm
313
... Ok. Yeah, we don't support windows but a couple of the fellas here develop and run heritrix from windows and it seems to work fine (Send us your .bat...
Michael Stack
stack@...
Send Email
Apr 14, 2004
11:40 pm
314
hi,there I checkout out ArchiveOpenCrawler from cvs and try to build with maven. ... __ __ ... ???????????????¡À???????¡§?????????? maven-1.0-beta-10.jar...
zhousp
zhousp@...
Send Email
Apr 19, 2004
6:20 am
315
hi, I try use "maven dist" to build ,It's give me the following error ... J:\work\heritrix\ArchiveOpenCrawler>maven dist __ __ ... ????????...
zhousp
zhousp@...
Send Email
Apr 20, 2004
12:24 am
316
Did you do maven setup? See here: http://maven.apache.org/start/install.html. Can you do anything w/ maven such as print out all goals: ...
Michael Stack
stack@...
Send Email
Apr 20, 2004
12:47 am
317
hi,Michael Stack, I have installed maven follow http://maven.apache.org/start/install.html and "maven -g" give me the right result. but when use "maven...
zhousp
zhousp@...
Send Email
Apr 20, 2004
1:51 am
318
... Hm. We use maven RC1 in our build. I just tried RC2 on a macintosh and got complaints about nonexistent targets ('jar', and 'dist'). Try RC1. I'd remove...
stack
stack@...
Send Email
Apr 20, 2004
3:00 am
319
hi,Michael Stack, I add two lines to project.propertiese ... maven.repo.local=C:/Documents and Settings/Administrator/.maven/repository ...
zhousp
zhousp@...
Send Email
Apr 20, 2004
9:22 am
320
... I'm glad though I don't understand why. Looks like it had legit values for the above two variables going by your error messages below. I'll add a note to...
stack
stack@...
Send Email
Apr 20, 2004
1:17 pm
321
As part of a pending RFE, Heritrix's conventions for naming and formatting the 'version-block' header record of ARCs is due to change this week. This is a...
Gordon Mohr
gojomo
Offline Send Email
Apr 21, 2004
12:00 am
322
Hi all, good ideas you have, here are a couple of comments => It's not possible to tell from file names which job each arc file comes from? During harvesting...
kaisa_kaunonen
Offline Send Email
Apr 21, 2004
1:20 pm
323
The ARC naming proposal sounds good to me. The only thing I might ponder is the idea of a separate identifier, for the instance of a repeated crawl....
Andrew Boyko
andyboyko
Online Now Send Email
Apr 21, 2004
4:32 pm
324
The proposal for adding the entire crawl order in the ARC-file sounds very attractive to me. But for wery large crawls (e.g. a crawl of 250.000 danish domains)...
bja@...
bjarne_dk2000
Offline Send Email
Apr 22, 2004
6:56 am
325
Dear All. After having made a checkout from from CVS, and made the distributions using 'maven dist', I get the following error: [heritrix@asterix heritrix]$...
Søren Vejrup Carlsen
svc400
Offline Send Email
Apr 22, 2004
2:23 pm
326
I have no explaination for why you'd get the below exception (Looks like jetty trying to find main but it can't because not on classpath according to URL...
Michael Stack
stack@...
Send Email
Apr 22, 2004
5:47 pm
327
Dear Michael. My failure to start heritrix is caused by some problem with the MANIFEST.MF created by the jar program. If I unzip heritrix-0.7.1.jar and then...
Søren Vejrup Carlsen
svc400
Offline Send Email
Apr 23, 2004
3:53 pm
328
My manifest looks like this: Manifest-Version: 1.0^M Ant-Version: Apache Ant 1.5.3 ^M Created-By: Apache Jakarta Maven^M Built-By: stack^M Package:...
stack
stack@...
Send Email
Apr 23, 2004
4:05 pm
329
Hi again. ... Yes, I am - so I'll just use maven rc1 instead. Obviously, it's not always a good thing to have the latest version(!). ... I am using IBM 1.4..1...
Søren Vejrup Carlsen
svc400
Offline Send Email
Apr 23, 2004
4:20 pm
330
Dear All. When I start heritrix-0.7.1 (as of April 23,2004), I get the message the heritrix_out.log: EVENT The scratchDir you specified: ...
Søren Vejrup Carlsen
svc400
Offline Send Email
Apr 23, 2004
5:49 pm
331
... Ok. I need to spend some time on figuring building w/ RC2. ... You should use the sun 1.4.2 jdk when building (Says so here ...
stack
stack@...
Send Email
Apr 23, 2004
6:01 pm
Messages 302 - 331 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help