Hi Folks, We have our own implementation of the crawl farm based on top of Heritrix 1.6. We moved to version 1.13 sometime back. As a part of our work we have...
Sounds really great Ankur. How has it been working for you? What size clusters have you been running with? Out of interest, did you use the CHF from the mg4j...
We'd be very excited to integrate this code! Does it require changes to core classes, or is it largely self-contained in its own new classes? (Especially with...
Hi, If you guys need help integrating this into 1.14. I'd be happy to give a hand. ... self-contained ... such a ... 2.x. ... implemented ... responsible ... ...
Hi guys, I followed this guidelines http://webteam.archive.org/confluence/display/Heritrix/Setting+up+the+new+Heritrix+in+Eclipse and got Heritrix built...
Hi, I am having trouble crawling authenticated (password protected) pages. I am using RFC Realm based authentication. Here is the log information I got: ...
Hi, are there some reasons heritrix uses the 2.3 version of jerico html parser instead of the current 2.5 version? Version 2.5 is available in maven repository...
It does not require any changes to the core classes as behaviour is largely modelled in terms of additional classes that intergate with Heritrix directly and...
Michael, We have been running the crawls in our dev environments which is a cluster of 7 crawlers. Some of the other groups have been using it for larger...
Is there URI with scheme "dns://" in web pages? Where does heritrix get the address of DNS host in DNS looking up? Why FetchDN£Ó£¿What's the Class of...
calvin.he.84@...
May 5, 2008 10:03 am
5174
Hi All, I have been using Heritrix-2.0.0 for web-crawling. Recently i tried changing the Queue Assignment Policy from the default ...
Pratyush Banerjee
Pratyushbanerjee@...
May 5, 2008 4:12 pm
5175
I haven't seen this before. I presume you mean 'mvn install'? What's your OS, java, and maven versions? Does trying again with "-e" show more details? - Gordon...
Thanks Gordon for your prompt reply. I already tried the changes you suggested.? But that does not however solve the problem. I tried making the Key public and...
Pratyush Banerjee
Pratyushbanerjee@...
May 6, 2008 7:54 am
5178
I suspect this is a different issue entirely. Have you made other changes to the default configuration, besides specifying a BucketQueueAssignmentPolicy? Or is...
I didn't want to risk any changes to robots handling in the final days leading to release 1.14.0, but have since implemented support for the 'Crawl-Delay' and...
Hi Gordon, Initially my application crashed due to selection of the BucketQueueAssignmentPolicy. Obviously i had changed other default options like the ...
Pratyush Banerjee
Pratyushbanerjee@...
May 7, 2008 6:10 am
5181
I was able to reproduce the problem starting from a working, nearly-default configuration and substituting BucketQueueAssignmentPolicy via the web UI's...
Hi Gordon, Thanks for your effort. But i tried doing the same you mentioned. Heritrix-2.0.0 still throws up the same errors as before. I modified the global...
Pratyush Banerjee
Pratyushbanerjee@...
May 7, 2008 10:26 am
5183
Gordon, After quite some agonizing hours i think i found out what was causing such a horrible crash in the system. While debugging the Heritrix-2.0.0 code? i...
Pratyush Banerjee
Pratyushbanerjee@...
May 7, 2008 1:16 pm
5184
I am on Ubuntu 8.01, Java 6.0, maven2. When I tried "mvn install" from command line in /heritrix/dist, it built fine ... [INFO] ... [INFO] BUILD SUCCESSFUL ...
I discovered why I could not reproduce your error. Although I had changed the root:queue-assignment-policy to be a BucketQueueAssignmentPolicy, the frontier...
Thanks Gordon, for so much of your time. Meanwhile i would have a look at the SVN 2 heritrix trunk. Thanking you Pratyush ... From: Gordon Mohr...
Pratyush Banerjee
Pratyushbanerjee@...
May 8, 2008 6:01 am
5188
Im just trying to run the basic example for hdfs writer crawler and no matter what i try, i keep getting, Connenction refused. Any ideas? alerts.log for...
Ryan Smith
ryan.justin.smith@...
May 8, 2008 11:30 pm
5189
I'm still unsure of the cause of this problem. However, if youcan maven build Heritrix outside Eclipse, that may be enough. Once a maven-build has brought all...
I get this same issue in eclipse 3.3.2, ubuntu 7.10, m2eclipse plugin...
Ryan Smith
ryan.justin.smith@...
May 9, 2008 4:18 pm
5191
Do you mean that like for Khanh, the maven build works fine for you outside Eclipse, but inside Eclipse fails with a error about the jetty-6.0.2 POM? What...
Indeed. I can build from command line, but eclipse fails the distribution part. ====== $ mvn -v Maven version: 2.0.9 Java version: 1.6.0_05 OS name: "linux"...