hi,there I use heritrix crawle some site,and after crawle 91340 of 170008 url,I found heritrix paused from WEB UI,and some error message list on Console. ... ...
zhousp
zhousp@...
Jul 1, 2004 8:18 am
563
hi,all where can find jayback? http://archive-crawler.sourceforge.net/cgi-bin/wiki.pl?JayBack sorry I can't find the url to download it! ansi zhou...
zhousp
zhousp@...
Jul 1, 2004 8:34 am
564
... Ansi: There is no download. The page describes a tool yet-to-be developed. St.Ack...
stack
stack@...
Jul 1, 2004 2:17 pm
565
... OOME is a subclass of VirtualMachineError which states: "Thrown to indicate that the Java Virtual Machine is broken or has run out of resources necessary...
stack
stack@...
Jul 1, 2004 5:44 pm
566
hi,stack, I will try to use more memory to run heritrix later. The last problem which heritrix can work fine in WSAD and can't work on my .bat file have...
zhousp
zhousp@...
Jul 2, 2004 12:38 am
567
... Yes, Heritrix requires special patched versions of a couple of classes in the Apache Jakarta HTTPClient library These alternate versions are in the...
Hi all, I am unable to build from the 10.0-src tarball under Maven 1.0-rc2, using either the 'dist' or 'jar' goal (I usually use 'jar' since I haven't bothered...
Are you running w/ a 2.6 kernel Tom? The failures seemed to say so. If you are, see Andy Boyko's notes in ...
stack
stack@...
Jul 2, 2004 11:06 pm
570
... I'm not running Linux --- Mac OS X 10.3.4 (Darwin 7.4.0) --- but the problem appears to be the same. I opened a tracker issue on what I was seeing, ...
... What did you change Tom? Should we add it to core? St.Ack...
stack
stack@...
Jul 3, 2004 12:00 am
572
... I don't know if you want to add the change to the core or not: ... *************** *** 83,89 **** * @throws IOException */ private void lazyInitialize()...
Hi, couple of minor things I have come across. Building on XP - I had a quick go at setting up a build environment on my XP laptop at the weekend and it...
mark williamson
Mark.Williamson@...
Jul 5, 2004 1:11 pm
574
Hi, finally starting work on the re-visting stuff: Is there are reason why stuff in the FrontierMarker is package access only. I'm setting up my new Frontier...
mark williamson
Mark.Williamson@...
Jul 5, 2004 1:31 pm
575
... No limit that I know of Mark. Was there anything in the file heritrix_out.log? Can you launch any program with a heap of 5gigs on that machine? St.Ack...
stack
stack@...
Jul 5, 2004 6:13 pm
576
... Pardon us Mark. You're probably first to have a go at other than a play Frontier or the default Frontier of our own making. I changed HEAD so that...
stack
stack@...
Jul 5, 2004 6:29 pm
577
... I just had same issue on macintosh. I'd say the 's' is necessary doing our disk-backed queues. Removing it might make for strange queue states. I guess...
stack
stack@...
Jul 5, 2004 6:39 pm
578
... Since when did you start using a Mac? :-) ... Write once, run anywhere, huh? ... Indeed: Linux Kernel 2.6 and Darwin 7.4.0. I haven't investigated to see...
I suspect this is a VM and OS issue. The following suggests Windows processes will only ever get 2GB of directly addressable memory, and the Java heap will...
OOps - I can see my message was a bit confusing. The 8GB machine is a dual xeon linux/gentoo machine. the XP issue is from my XP laptop. cheers mark ... From:...
Williamson, Mark
Mark.Williamson@...
Jul 5, 2004 8:13 pm
581
Xeon is still a 32-bit processor, and 32-bit Linux also has an absolute 4GB ceiling on addressable memory for a single process -- probably even lower depending...
Hi, here is a patch to add the a proxy server to the crawl settings. Patch wise its awful because I auto formated the code in Eclipse and so the format of the...
mark williamson
Mark.Williamson@...
Jul 6, 2004 2:58 pm
583
[I sent this under separate cover to stack, but figured I'd send it to the list as a whole as well] Attached is a patch to FetchHTTP.java that allows you to...
I have been having an issue with the max-time-sec property. I am testing using 120 seconds, since I do not want a crawl to last more than two minutes. Anyway...
This is a bug in 0.10. It's fixed in CVS, and it also worked correctly in 0.8.1. Andy Boyko aboy@... ... I noticed yesterday that it was re-fetching...
Opps. Forgot to attach patch. Here it is. St.Ack ? hs_err_pid23051.log ? maven.log ? velocity.log Index: src/java/org/archive/crawler/fetcher/FetchHTTP.java ...
stack
stack@...
Jul 7, 2004 6:12 pm
591
Your a good man Mark. The patch is a little ugly for sure (smile). I went through it and extracted the attached (I made proxy an expert setting). Does it...