Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 562 - 591 of 6144   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
562
hi,there I use heritrix crawle some site,and after crawle 91340 of 170008 url,I found heritrix paused from WEB UI,and some error message list on Console. ... ...
zhousp
zhousp@...
Send Email
Jul 1, 2004
8:18 am
563
hi,all where can find jayback? http://archive-crawler.sourceforge.net/cgi-bin/wiki.pl?JayBack sorry I can't find the url to download it! ansi zhou...
zhousp
zhousp@...
Send Email
Jul 1, 2004
8:34 am
564
... Ansi: There is no download. The page describes a tool yet-to-be developed. St.Ack...
stack
stack@...
Send Email
Jul 1, 2004
2:17 pm
565
... OOME is a subclass of VirtualMachineError which states: "Thrown to indicate that the Java Virtual Machine is broken or has run out of resources necessary...
stack
stack@...
Send Email
Jul 1, 2004
5:44 pm
566
hi,stack, I will try to use more memory to run heritrix later. The last problem which heritrix can work fine in WSAD and can't work on my .bat file have...
zhousp
zhousp@...
Send Email
Jul 2, 2004
12:38 am
567
... Yes, Heritrix requires special patched versions of a couple of classes in the Apache Jakarta HTTPClient library These alternate versions are in the...
Gordon Mohr (Internet...
gojomo
Online Now Send Email
Jul 2, 2004
2:21 am
568
Hi all, I am unable to build from the 10.0-src tarball under Maven 1.0-rc2, using either the 'dist' or 'jar' goal (I usually use 'jar' since I haven't bothered...
Tom Emerson
tree02139
Offline Send Email
Jul 2, 2004
10:30 pm
569
Are you running w/ a 2.6 kernel Tom? The failures seemed to say so. If you are, see Andy Boyko's notes in ...
stack
stack@...
Send Email
Jul 2, 2004
11:06 pm
570
... I'm not running Linux --- Mac OS X 10.3.4 (Darwin 7.4.0) --- but the problem appears to be the same. I opened a tracker issue on what I was seeing, ...
Tom Emerson
tree02139
Offline Send Email
Jul 2, 2004
11:22 pm
571
... What did you change Tom? Should we add it to core? St.Ack...
stack
stack@...
Send Email
Jul 3, 2004
12:00 am
572
... I don't know if you want to add the change to the core or not: ... *************** *** 83,89 **** * @throws IOException */ private void lazyInitialize()...
Tom Emerson
tree02139
Offline Send Email
Jul 3, 2004
12:06 am
573
Hi, couple of minor things I have come across. Building on XP - I had a quick go at setting up a build environment on my XP laptop at the weekend and it...
mark williamson
Mark.Williamson@...
Send Email
Jul 5, 2004
1:11 pm
574
Hi, finally starting work on the re-visting stuff: Is there are reason why stuff in the FrontierMarker is package access only. I'm setting up my new Frontier...
mark williamson
Mark.Williamson@...
Send Email
Jul 5, 2004
1:31 pm
575
... No limit that I know of Mark. Was there anything in the file heritrix_out.log? Can you launch any program with a heap of 5gigs on that machine? St.Ack...
stack
stack@...
Send Email
Jul 5, 2004
6:13 pm
576
... Pardon us Mark. You're probably first to have a go at other than a play Frontier or the default Frontier of our own making. I changed HEAD so that...
stack
stack@...
Send Email
Jul 5, 2004
6:29 pm
577
... I just had same issue on macintosh. I'd say the 's' is necessary doing our disk-backed queues. Removing it might make for strange queue states. I guess...
stack
stack@...
Send Email
Jul 5, 2004
6:39 pm
578
... Since when did you start using a Mac? :-) ... Write once, run anywhere, huh? ... Indeed: Linux Kernel 2.6 and Darwin 7.4.0. I haven't investigated to see...
Tom Emerson
tree02139
Offline Send Email
Jul 5, 2004
6:49 pm
579
I suspect this is a VM and OS issue. The following suggests Windows processes will only ever get 2GB of directly addressable memory, and the Java heap will...
Gordon Mohr (@Interne...
gojomo
Online Now Send Email
Jul 5, 2004
8:00 pm
580
OOps - I can see my message was a bit confusing. The 8GB machine is a dual xeon linux/gentoo machine. the XP issue is from my XP laptop. cheers mark ... From:...
Williamson, Mark
Mark.Williamson@...
Send Email
Jul 5, 2004
8:13 pm
581
Xeon is still a 32-bit processor, and 32-bit Linux also has an absolute 4GB ceiling on addressable memory for a single process -- probably even lower depending...
Gordon Mohr (@Interne...
gojomo
Online Now Send Email
Jul 6, 2004
8:38 am
582
Hi, here is a patch to add the a proxy server to the crawl settings. Patch wise its awful because I auto formated the code in Eclipse and so the format of the...
mark williamson
Mark.Williamson@...
Send Email
Jul 6, 2004
2:58 pm
583
[I sent this under separate cover to stack, but figured I'd send it to the list as a whole as well] Attached is a patch to FetchHTTP.java that allows you to...
Tom Emerson
tree02139
Offline Send Email
Jul 6, 2004
5:01 pm
584
I have been having an issue with the max-time-sec property. I am testing using 120 seconds, since I do not want a crawl to last more than two minutes. Anyway...
jirleech
Offline Send Email
Jul 7, 2004
3:54 pm
585
I tested the patch and committed. Its pretty. Good stuff Tom. Thanks, St.Ack P.S. I added you as a project contributor: ...
stack
stack@...
Send Email
Jul 7, 2004
3:57 pm
586
... The settings framework is very cool and easy to use. Kudos to you all for that. ... Thanks. ... Not at all. -tree -- Tom Emerson...
Tom Emerson
tree02139
Offline Send Email
Jul 7, 2004
4:26 pm
587
... Sounds like a bug. Sure its not waiting on dns/robots? Any chance of sending us the logs -- all thats mentioned in the crawl manifest plus ...
stack
stack@...
Send Email
Jul 7, 2004
4:54 pm
588
I noticed yesterday that it was re-fetching pages that had # bookmarks in them, i.e. it grabbed: http://www.site.com/randompage.html ...
robeger
Online Now Send Email
Jul 7, 2004
5:55 pm
589
This is a bug in 0.10. It's fixed in CVS, and it also worked correctly in 0.8.1. Andy Boyko aboy@... ... I noticed yesterday that it was re-fetching...
Andrew Boyko
andyboyko
Online Now Send Email
Jul 7, 2004
6:05 pm
590
Opps. Forgot to attach patch. Here it is. St.Ack ? hs_err_pid23051.log ? maven.log ? velocity.log Index: src/java/org/archive/crawler/fetcher/FetchHTTP.java ...
stack
stack@...
Send Email
Jul 7, 2004
6:12 pm
591
Your a good man Mark. The patch is a little ugly for sure (smile). I went through it and extracted the attached (I made proxy an expert setting). Does it...
stack
stack@...
Send Email
Jul 7, 2004
6:12 pm
Messages 562 - 591 of 6144   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help