Hi ! I updated my HERITRIX installation from CVS - and now I can't crawl at all - I get alerts on every try: Could someone tell me whether the CVS version is...
Hello Bjarne. I just tried a build from HEAD and all seems to work fine. Perhaps your order file is from a previous version and the newer code has trouble ...
Michael Stack
stack@...
Apr 6, 2004 5:08 pm
304
The alerts all came up in the UI - when configuring HERITRIX from inside the UI (using the Simple Profile) I returned to the official release 0.6.0 - it works...
Hello ! Does HERITRIX handle cookies? - in the UI there are two text-fiels for save and load cookie-file ! When the crawler runs - does it save cookies...
Hi, I collected a nice test archive of about 100000 docs with heritrix 0.6.0 I think it went well (I didn't yet try out very baaad web sites;) Now I try to...
... Yes it does. Handling of cookies is done by default. The load cookies option allows an operator to pre-load existing cookies file (in the Netscape's ...
Hello every one, Im trying to use heritrix on a windows(!) plattform. Whenever i submit a job via the web interface i get an error - here is the log (alert)...
Hi ! We are testing HERITRIX in connection with harvesting specially selected websites - when harvesting only one website (on only one host / domain) the...
... A first step would be to eliminate the politeness delay between requests to a single server. To do this, in the 'frontier' section of the crawl job...
... Ok. Yeah, we don't support windows but a couple of the fellas here develop and run heritrix from windows and it seems to work fine (Send us your .bat...
Michael Stack
stack@...
Apr 14, 2004 11:40 pm
314
hi,there I checkout out ArchiveOpenCrawler from cvs and try to build with maven. ... __ __ ... ???????????????¡À???????¡§?????????? maven-1.0-beta-10.jar...
zhousp
zhousp@...
Apr 19, 2004 6:20 am
315
hi, I try use "maven dist" to build ,It's give me the following error ... J:\work\heritrix\ArchiveOpenCrawler>maven dist __ __ ... ????????...
zhousp
zhousp@...
Apr 20, 2004 12:24 am
316
Did you do maven setup? See here: http://maven.apache.org/start/install.html. Can you do anything w/ maven such as print out all goals: ...
Michael Stack
stack@...
Apr 20, 2004 12:47 am
317
hi,Michael Stack, I have installed maven follow http://maven.apache.org/start/install.html and "maven -g" give me the right result. but when use "maven...
zhousp
zhousp@...
Apr 20, 2004 1:51 am
318
... Hm. We use maven RC1 in our build. I just tried RC2 on a macintosh and got complaints about nonexistent targets ('jar', and 'dist'). Try RC1. I'd remove...
stack
stack@...
Apr 20, 2004 3:00 am
319
hi,Michael Stack, I add two lines to project.propertiese ... maven.repo.local=C:/Documents and Settings/Administrator/.maven/repository ...
zhousp
zhousp@...
Apr 20, 2004 9:22 am
320
... I'm glad though I don't understand why. Looks like it had legit values for the above two variables going by your error messages below. I'll add a note to...
stack
stack@...
Apr 20, 2004 1:17 pm
321
As part of a pending RFE, Heritrix's conventions for naming and formatting the 'version-block' header record of ARCs is due to change this week. This is a...
Hi all, good ideas you have, here are a couple of comments => It's not possible to tell from file names which job each arc file comes from? During harvesting...
The ARC naming proposal sounds good to me. The only thing I might ponder is the idea of a separate identifier, for the instance of a repeated crawl....
The proposal for adding the entire crawl order in the ARC-file sounds very attractive to me. But for wery large crawls (e.g. a crawl of 250.000 danish domains)...
Dear All. After having made a checkout from from CVS, and made the distributions using 'maven dist', I get the following error: [heritrix@asterix heritrix]$...
I have no explaination for why you'd get the below exception (Looks like jetty trying to find main but it can't because not on classpath according to URL...
Michael Stack
stack@...
Apr 22, 2004 5:47 pm
327
Dear Michael. My failure to start heritrix is caused by some problem with the MANIFEST.MF created by the jar program. If I unzip heritrix-0.7.1.jar and then...
My manifest looks like this: Manifest-Version: 1.0^M Ant-Version: Apache Ant 1.5.3 ^M Created-By: Apache Jakarta Maven^M Built-By: stack^M Package:...
stack
stack@...
Apr 23, 2004 4:05 pm
329
Hi again. ... Yes, I am - so I'll just use maven rc1 instead. Obviously, it's not always a good thing to have the latest version(!). ... I am using IBM 1.4..1...