Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 282 - 311 of 6140   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
282
Dear All. Here are some comments on the README.txt (version 1.10) in the distributrion. (1) In section 1,2, I suggest we mention how to do this assignment by...
Søren Vejrup Carlsen
svc400
Offline Send Email
Mar 1, 2004
5:25 pm
283
I'm working on character encodings in heritrix. I made a proposal for addressing our current blindness for other than single-byte character sets in ...
Michael Stack
stack@...
Send Email
Mar 2, 2004
6:19 pm
284
We have begun discussing how the crawler could be able to revisit already crawled URIs. Some initial thoughts have been written on the Wiki <url: ...
John Erik Halse
johnerikhalse
Offline Send Email
Mar 6, 2004
12:51 am
285
Hi all, By default all JSP pages are recompiled each time you launch Heritrix. This is of course very annoying. But if you create a directory called 'work' in...
Kristinn Sigurðsson
kristsi25
Offline Send Email
Mar 10, 2004
7:51 pm
286
Update on this issue. Recent change (post 0.6.0) includes a directive to Jetty to store the compiled JSP pages in a fixed location. That means that the 'work'...
Kristinn Sigurðsson
kristsi25
Offline Send Email
Mar 26, 2004
7:02 pm
287
Dear all. I have just been looking at your dependencies page. Doesn't the "servlet", "jasper-runtime", and "jasper-compiler" come from the Apache...
Søren Vejrup Carlsen
svc400
Offline Send Email
Mar 26, 2004
8:02 pm
288
Here's a proposal for the Heritrix negotiation of authentication schemes feature: http://crawler.archive.org/proposals/auth/ Would love feedback if any. Will...
Michael Stack
stack@...
Send Email
Mar 26, 2004
11:10 pm
289
I have been playing around with heritrix for a few weeks now and I am in the process of turning it loose on a controlled environment for one of my research...
sebastiandelachica
sebastiandel...
Offline Send Email
Mar 29, 2004
9:10 pm
290
Currently you can build heritrix with maven and ant. The maven build is more complete in that it generates all of documentation and the site at ...
Michael Stack
stack@...
Send Email
Mar 29, 2004
9:40 pm
291
Thanks for trying Heritrix Seb. See below. ... Do you mean 0.4.0. You say 0.9.0 above. We just released 0.6.0 on friday. Try it if you haven't already. Lots...
Michael Stack
stack@...
Send Email
Mar 29, 2004
10:11 pm
292
Michael, Thanks for the prompt reply. Indeed I upgraded to 0.6.0 over the weekend: amazing how close a 9 looks to a 6 given enough lack of sleep In English (to...
sebastiandelachica
sebastiandel...
Offline Send Email
Mar 30, 2004
12:23 am
293
Hei Seb. See below ... If you are only crawling one site at a time (using DomainScope) the max-bytes-download just the thing for you. It limits the total...
Kristinn Sigurðsson
kristsi25
Offline Send Email
Mar 30, 2004
12:51 am
294
Dear St.Ack. I have a comment to assumption 3 (section 1.2.3): No means of recording credentials used authenticating in an ARC But shouldn't there be a means...
Søren Vejrup Carlsen
svc400
Offline Send Email
Mar 30, 2004
9:51 am
295
Hiya Kris, Thanks for the hint. That is pretty much where I ended up last nite. To clarify, my original intent was to manage xtiple sites from a single crawl...
sebastiandelachica
sebastiandel...
Offline Send Email
Mar 30, 2004
3:31 pm
296
... Yes Søren. It needs to be addressed but it probably won't be before first delivery of this new Heritrix feature. Thanks for the feedback. St.Ack...
Michael Stack
stack@...
Send Email
Mar 30, 2004
4:30 pm
297
See below ... We are also aware of the fact that you can't overload websites and that is why the crawler is very polite. If you look at the settings under...
Kristinn Sigurðsson
kristsi25
Offline Send Email
Mar 30, 2004
4:43 pm
298
... The versions of jasper*.jar and servlet*.jar checked into heritrix came from the Jetty 4.2.17 bundle. Rather than our going via the middleman, Jetty, I...
Michael Stack
stack@...
Send Email
Mar 30, 2004
11:06 pm
299
It seems that Sebs concern is not just polities but the number of bytes downloaded from sites. Some ISPs will charge you arm and leg if you exceed given...
Igor Ranitovic
iranitovic
Offline Send Email
Mar 30, 2004
11:16 pm
300
Also, RFE 891986 added a bandwidth-throttle facility, which I believe can be set per host. John Erik, can you say more about this capability? - Gordon...
Gordon Mohr
gojomo
Online Now Send Email
Mar 30, 2004
11:27 pm
301
The bandwidth-throttle facility consists of two different settings. One which sets the maximum average bandwidth the crawler is allowed to use. The other...
John Erik Halse
johnerikhalse
Offline Send Email
Mar 30, 2004
11:52 pm
302
Hi ! I updated my HERITRIX installation from CVS - and now I can't crawl at all - I get alerts on every try: Could someone tell me whether the CVS version is...
bja@...
bjarne_dk2000
Offline Send Email
Apr 6, 2004
12:36 pm
303
Hello Bjarne. I just tried a build from HEAD and all seems to work fine. Perhaps your order file is from a previous version and the newer code has trouble ...
Michael Stack
stack@...
Send Email
Apr 6, 2004
5:08 pm
304
The alerts all came up in the UI - when configuring HERITRIX from inside the UI (using the Simple Profile) I returned to the official release 0.6.0 - it works...
bja@...
bjarne_dk2000
Offline Send Email
Apr 7, 2004
7:41 am
305
Hello ! Does HERITRIX handle cookies? - in the UI there are two text-fiels for save and load cookie-file ! When the crawler runs - does it save cookies...
bja@...
bjarne_dk2000
Offline Send Email
Apr 7, 2004
7:45 am
306
Hi, I collected a nice test archive of about 100000 docs with heritrix 0.6.0 I think it went well (I didn't yet try out very baaad web sites;) Now I try to...
kaisa_kaunonen
Offline Send Email
Apr 7, 2004
11:13 am
307
... Yes it does. Handling of cookies is done by default. The load cookies option allows an operator to pre-load existing cookies file (in the Netscape's ...
Igor Ranitovic
iranitovic
Offline Send Email
Apr 7, 2004
6:14 pm
308
Hello every one, Im trying to use heritrix on a windows(!) plattform. Whenever i submit a job via the web interface i get an error - here is the log (alert)...
thomasschmegg
Offline Send Email
Apr 13, 2004
12:23 pm
309
Thomas: Which version of heritrix? Is it a release or cvs HEAD? Thanks, St.Ack...
stack
stack@...
Send Email
Apr 13, 2004
3:21 pm
310
... oh, sorry for the lack of that information. i'm using version 0.6.0 which i have downloaded from the heritrix homepage. ... windows?...
thomasschmegg
Offline Send Email
Apr 14, 2004
11:27 am
311
Hi ! We are testing HERITRIX in connection with harvesting specially selected websites - when harvesting only one website (on only one host / domain) the...
bja@...
bjarne_dk2000
Offline Send Email
Apr 14, 2004
7:26 pm
Messages 282 - 311 of 6140   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help