Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 2014 - 2043 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
2014
... Hello Kaisa. ... Is this the only test that fails? Off-the-list, Mike Schwartz reports that retrying the build, the test passes second-time around. Is...
stack
stackarchiveorg
Offline Send Email
Jul 1, 2005
5:52 pm
2015
... Probably because we include the JMX reference implementation in Heritrix and its clashing with the jboss JMX implementation. What happens if you leave in...
stack
stackarchiveorg
Offline Send Email
Jul 1, 2005
7:19 pm
2016
... We mustn't be making use of TypeHandler otherwise I'd imagine there'd be compile-time complaints. What tool are you using to do the dependency checking? ...
stack
stackarchiveorg
Offline Send Email
Jul 1, 2005
7:34 pm
2017
I'm trying to only save documents that have a certain pattern in the body of the document. I can't figure out a way to do this. It's possible to filter on...
video_guid
Offline Send Email
Jul 1, 2005
10:22 pm
2018
Thanks for the response. I'm trying to crawl as many english language sites as possible. I have a cluster of a few machines I can dedicate to this task. I'm ...
video_guid
Offline Send Email
Jul 1, 2005
10:25 pm
2019
... There is no such filter in Heritrix currently. You'll have to write one. Do it as a standalone filter or as a DecideRule to include in a DecidingFilter....
stack
stackarchiveorg
Offline Send Email
Jul 1, 2005
10:49 pm
2020
... write ... a ... own ... on ... running ... I'll give it a go. Thanks for the pointers....
video_guid
Offline Send Email
Jul 1, 2005
11:11 pm
2021
... There is not yet an automated way of running a crawl on a cluster. We're working on it. Meantime, in house, we've been using the recently added...
stack
stackarchiveorg
Offline Send Email
Jul 1, 2005
11:19 pm
2022
... You might also take a look at the recent Rainbow interface contribution (See http://groups.yahoo.com/group/archive-crawler/message/1905). You might recast...
stack
stackarchiveorg
Offline Send Email
Jul 1, 2005
11:27 pm
2023
Yes, this test seems to be the only one failing. It always does so, second build is no help. I wrote into the project.xml ...
Kaisa Kaunonen
kaisa_kaunonen
Offline Send Email
Jul 2, 2005
10:08 am
2024
I'd like to hightlight two recent contributions. 1. Mark Williamson of the British Library organized the contribution of Hedaern, an ARC access tool. The...
stack
stackarchiveorg
Offline Send Email
Jul 4, 2005
7:10 pm
2025
... Looks like this failure is easy reproduce if build is done over fedora. For now I've made an issue and commented out this test of unused functionality. ......
stack
stackarchiveorg
Offline Send Email
Jul 4, 2005
10:40 pm
2026
Hi, I actually use only the heritrix-1.4.0.jar in our system without the the jmxri*.jar and the jmxtools*.jars from your distribution. As I have pointed out,...
Holger Stenzhorn
holgerstenzhorn
Offline Send Email
Jul 5, 2005
10:00 am
2027
Hi again, I simply checked out from Sourceforge CVS a new version of Heritrix this morning and the build went through although it had some 33 warnings in...
Kaisa Kaunonen
kaisa_kaunonen
Offline Send Email
Jul 5, 2005
12:40 pm
2028
... Thats odd that it would start working like that (The javadoc warnings we need to fix but they're harmless). ... Retry Kaisa. The below is an issue w/...
stack
stackarchiveorg
Offline Send Email
Jul 5, 2005
2:27 pm
2029
Hi, I am using Heritrix on a non-English Windows (i.e. German) and I constantly got a NullPointerException that could be traced to the line ...
Holger Stenzhorn
holgerstenzhorn
Offline Send Email
Jul 5, 2005
6:10 pm
2030
I cannot build the lastest version from HEAD: java:compile: [echo] Compiling to /tmp/heritrix-1.5.0-200507050934/target/classes [javac] Compiling 416 source...
bja@...
bjarne_dk2000
Offline Send Email
Jul 5, 2005
6:17 pm
2031
... My fault. Mistaken commit. Its been removed. Sorry about that. St.Ack...
stack
stackarchiveorg
Offline Send Email
Jul 5, 2005
6:39 pm
2032
I've made a simple new DomainnameQueueAssignmentPolicy that bases queues on domain-names instead of host-names (domain defined as 2 last names in the hostname)...
bja@...
bjarne_dk2000
Offline Send Email
Jul 5, 2005
6:56 pm
2033
To use the new DomainnameQueueAssignmentPolicy a minor change has to be made in AbstractFrontier.java private final static String []...
bja@...
bjarne_dk2000
Offline Send Email
Jul 5, 2005
7:59 pm
2034
Is it deliberate that hosts-report.txt has changed format to (in HEAD): [#urls] [#bytes] [host] 538 32710 dns: 10 38453 130.226.47.102 10 41802 www.etracker.de...
bja@...
bjarne_dk2000
Offline Send Email
Jul 5, 2005
8:44 pm
2035
Yes. The primary aim was to place the most important and compact numeric info to the left, where it would not scroll/line-wrap off the right margin. (The...
Gordon Mohr
gojomo
Offline Send Email
Jul 5, 2005
9:36 pm
2036
I got Heritrix compiled with Maven and now the crawler is running too. Thanks for your help. I also imported Heritrix into Eclipse. It’ a great visual way to...
Kaisa Kaunonen
kaisa_kaunonen
Offline Send Email
Jul 6, 2005
9:05 am
2037
Kaisa, you'll need to configure Eclipse for Java 1.4 compliance to get rid of the assert errors (prior to Java 1.4 'assert' was not a keyword but currently...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Jul 6, 2005
9:20 am
2038
Hi, I have been using Heritrix 1.2 with an order file containing the entries <integer name="max-link-hops">1</integer> <integer...
Holger Stenzhorn
holgerstenzhorn
Offline Send Email
Jul 7, 2005
8:06 am
2039
Dear all, I was happily crawling the web, when I've got an out-of-memory error and heritrix hanged up. I tried to restart the crawl through the recovery...
Marco Baroni
kumaraja2000
Offline Send Email
Jul 7, 2005
1:41 pm
2040
Hi, I have experimented a bit and added log output to the classes CrawlScope for 1.2 and ClassicScope for 1.4 which I have attached to this mail. I can see...
Holger Stenzhorn
holgerstenzhorn
Offline Send Email
Jul 7, 2005
1:42 pm
2041
... No, we wouldn't. It's one jar file that depends on another, so it doesn't get compiled. ... A derivate of JavaDepend ...
Lars Clausen
lrclause
Offline Send Email
Jul 7, 2005
1:54 pm
2042
... The tail on the recover log is an incomplete compression block; The crash interrupted the compressed recover log writing. Because of this, gzip is...
stack
stackarchiveorg
Offline Send Email
Jul 7, 2005
2:22 pm
2043
Thanks, I will follow your advice! Marco...
Marco Baroni
kumaraja2000
Offline Send Email
Jul 7, 2005
2:27 pm
Messages 2014 - 2043 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help