Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 5091 - 5120 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
5091
I would like to be able to connect to an RDBMS while heritrix is running. I was able to do this with 1.13 by deploying it into jboss and allow jboss to manage...
gkbrown22
Offline Send Email
Apr 1, 2008
11:01 am
5092
I'm bringing up this issue again. First of all, we're now doing repeatable 5B+ crawling using Heritrix. Thanks to you guys. These days we've got quite a few...
joehung302
Online Now Send Email
Apr 1, 2008
7:55 pm
5093
... How do you receive your complaints?...
nfoscarini
Offline Send Email
Apr 1, 2008
8:18 pm
5094
You should always include contact info at the URL Heritrix requires you to add to your User-Agent. - Gordon...
Gordon Mohr
gojomo
Online Now Send Email
Apr 1, 2008
10:42 pm
5095
... We put our crawler "signature" in Heritrix when we crawl the Internet. The signature is specified in the order.xml file. Heritrix, as a user agent, will...
joehung302
Online Now Send Email
Apr 1, 2008
11:20 pm
5096
Hi Gordon and nfoscarini: Thanks for your help. I am indeed running on Ubuntu Linux. Ubuntu (and Debian) now have full support for Sun's Java, and I have it...
Gordon Paynter
Gordon.Paynter@...
Send Email
Apr 2, 2008
2:53 am
5097
This is an interesting topic, because I have not come across any Heritrix users (before now) who have been contacted. I have been crawling for months now with...
nfoscarini
Offline Send Email
Apr 2, 2008
2:23 pm
5098
... As this only requires shuttling an externally-provided number into the existing min-delay-ms setting, it should be pretty simple, but was overlooked as we...
Gordon Mohr
gojomo
Online Now Send Email
Apr 2, 2008
11:46 pm
5099
Hello folks, We are in the starting phase of a project, and we are currently wondering whether Heritrix or Nutch is the best choice of crawler for us. Our...
Svein Yngvar Willassen
svein@...
Send Email
Apr 5, 2008
9:36 pm
5100
Hey all, I just wanted to let everyone know about something that has happened at my work that I've dubbed "the Heritrix effect." We've used Heritrix on several...
Micah Wedemeyer
mwedeme@...
Send Email
Apr 7, 2008
2:32 pm
5101
... wondering ... I've never used Nutch, but I think Heritrix is more flexible given it's plug-in design. ... You intent to control your crawlers using Hadoop?...
nfoscarini
Offline Send Email
Apr 7, 2008
3:34 pm
5102
... Congrats. Does your new place have a website?...
nfoscarini
Offline Send Email
Apr 7, 2008
3:36 pm
5103
... No, we just want to use Hadoop to store and process the data from the crawlers. ... Heritrix Cluster Controller (hcc)? I've had a look at it, but must ...
Svein Yngvar Willassen
svein@...
Send Email
Apr 7, 2008
4:28 pm
5104
... No, I don't think it was hcc. I don't think hcc is under development anymore, and I never really understood what problem hcc was suppose to solve. Arg.....
nfoscarini
Offline Send Email
Apr 7, 2008
4:55 pm
5105
Thank you for your help. NetarchiveSuite certainly looks worth taking a closer look at. Best Regards, Svein ... -- Best Regards, Svein Y. Willassen ...
Svein Yngvar Willassen
svein@...
Send Email
Apr 7, 2008
5:44 pm
5106
... No. HCC is just a simple tool for addressing a herd of heritrice as one; start/stop/monitor, etc. If you look back over the heritrix archives, you'll see...
stack
stackarchiveorg
Offline Send Email
Apr 7, 2008
6:13 pm
5107
... Yeah, you can check them out at: http://vitrue.com/ They concentrate on video-centric social media and bringing it to companies that want to generate buzz...
Micah Wedemeyer
mwedeme@...
Send Email
Apr 7, 2008
9:27 pm
5108
Hi all: I am trying to crawl a website that requires a POST with some credentials as well as a challenge string. My first question is this: How can I (using...
bif_tannen
Offline Send Email
Apr 8, 2008
12:22 am
5109
Many congratulations buddy :-) Hope you have a good time ________________________________ From: archive-crawler@yahoogroups.com ...
Goel, Ankur
ankur_goel79
Offline Send Email
Apr 8, 2008
12:47 pm
5110
... I'm sure the web developer implemented the challenge form to prevent exactly what your trying to do....
nfoscarini
Offline Send Email
Apr 8, 2008
1:50 pm
5111
Hi Folks, Do we have a wiki page describing the best practises and useful tips when setting up multiple heritrix crawlers for doing large crawls ? Such a page...
Goel, Ankur
ankur_goel79
Offline Send Email
Apr 9, 2008
9:10 am
5112
Hi, Do we have an architecture document illustrating architectural changes when moving from 1.x to 2.x ? To start with a 1 or 2 page document with labelled...
Goel, Ankur
ankur_goel79
Offline Send Email
Apr 9, 2008
9:21 am
5113
Where configure the Proxy Servers in Heritrx? Or need I develop some plug-in? or modify some source code ? Thanks!...
何翔
calvin.he.84@...
Send Email
Apr 9, 2008
9:10 pm
5114
I want to crawl a given domain website. When I start Heritrix by default setting, it only start one thread and crawls so slowly. Later I'v known that it...
何翔
calvin.he.84@...
Send Email
Apr 9, 2008
9:10 pm
5115
Hi, I was trying to run Heritrix-2.0.0 in windows yesterday and ran into a slightly different problem. I followed the instructions for the jmxremote.password...
low_fi_db
Offline Send Email
Apr 9, 2008
9:11 pm
5116
Hi, I'm trying to get the server started (using ubuntu server) and access it's web interface. Launching worked apparently (messages end with "Web UI listening...
bernardsirius
Offline Send Email
Apr 9, 2008
9:12 pm
5117
At Wed, 09 Apr 2008 18:59:10 -0000, ... The web UI is listening on the localhost (127.0.0.1), not the external interface (in your case, 67.207.136.21). You...
Erik Hetzner
e_hetzner
Offline Send Email
Apr 10, 2008
12:29 am
5118
Great! works now. I had tried that but with the wrong syntax (-b=...) Thank you My next problem is that I'm supposed to find something at...
Bernard Sirius
bernardsirius
Offline Send Email
Apr 10, 2008
7:56 am
5119
... Hi Bernard. Try connecting directly to http://67.207.136.21:8080/. For me this works. I can connect o your Web administrative Console from here. The path...
Christian Krumm
chuk_ol
Offline Send Email
Apr 10, 2008
8:25 am
5120
What the meaning of "Surt Prefixed" in Heritrix Document?...
何翔
calvin.he.84@...
Send Email
Apr 16, 2008
2:11 am
Messages 5091 - 5120 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help