Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 1453 - 1482 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
1453
You might want to take a look at the Automated Revisiting Module being developed at the moment by kris@... It does implement a new Frontier including a...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Feb 1, 2005
8:53 am
1454
Yes, the AR module (currently available as a branch of the Heritrix project, http://crawltools.archive.org:8080/cruisecontrol/buildresults/BRANCH-heritri ...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Feb 1, 2005
9:09 am
1455
Kris, I was aware of your AR module and should have asked a couple questions about it in that earlier post. The algorithm I suggested could be written as a...
John R. Frank
tamarind473
Offline Send Email
Feb 1, 2005
3:10 pm
1456
Hey John, Partly the reason for a seperate frontier is one of parallel development. When I started working on it there was no BDBFrontier and there Some other...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Feb 1, 2005
3:33 pm
1457
... Would be great if you could confirm that you are indeed getting better results. ... Can you cite the section in rfc2616 where it says this please Dave (I ...
Michael Stack
stackarchiveorg
Offline Send Email
Feb 1, 2005
5:42 pm
1458
I have a stored profile that I use for all of my crawls that contains the elaborate (well, that may be an overstatement, but I do tweak a lot of nobs and add...
Tom Emerson
tree02139
Offline Send Email
Feb 1, 2005
8:20 pm
1459
... see private reply to your archive.org address ... see sections 15.1.2 and 15.1.3 I'm basing my comment on the following from 15.1.3 Clients SHOULD NOT...
Dave Skinner
frodobay
Offline Send Email
Feb 1, 2005
8:20 pm
1460
... Smile. I'll try it for you Tom if you want to send my your profile -- because it should just work (I'd like to see what issues we run into). St.Ack...
stack
stackarchiveorg
Offline Send Email
Feb 1, 2005
8:38 pm
1461
Executing the following command: /home/apps/heritrix/bin/heritrix -n /home/apps/heritrix/conf/profiles/testProfile/order.xml With the following order.xml...
Rich Collins
richwcollins
Offline Send Email
Feb 1, 2005
8:47 pm
1462
It appears that Heritrix does not use the system host file (/etc/hosts). Is this correct? -- Rich Collins Director of Information Technology InjuryBoard.com ...
Rich Collins
richwcollins
Offline Send Email
Feb 1, 2005
10:24 pm
1463
... I tried it. There was one issue where an old default didn't make the transition nicely so I added code to handle this in HEAD. Otherwise, there is one...
stack
stackarchiveorg
Offline Send Email
Feb 1, 2005
10:52 pm
1464
... Heritrix uses the dnsjava library to do its lookups. See 'Limitations' section in http://www.xbill.org/dnsjava/README for description of how it goes about...
stack
stackarchiveorg
Offline Send Email
Feb 1, 2005
11:05 pm
1465
I've got a midfetch filter that looks at the content-length and last-modified headers and pretends that a disk directory structure (as could be produced by...
Dave Skinner
frodobay
Offline Send Email
Feb 2, 2005
12:09 am
1466
Eyery time I use a file compare program like Beyond Compare to compare the old order.xml with the new one.And change the diffence manually:( Ansi...
ansi
mymaillist@...
Send Email
Feb 2, 2005
12:43 am
1467
Hi Kris, Help me get up to speed with what your thinking here. I'm obviously totally new here, so take my questions as interest not argument. ... A duplicate...
John R. Frank
tamarind473
Offline Send Email
Feb 2, 2005
1:12 am
1468
The documentation only hints at how to use the SurtPrefixScope. I'm assuming that the surt patterns do not go into the seeds file. I assume I add a...
Tom Emerson
tree02139
Offline Send Email
Feb 2, 2005
1:14 am
1469
... Yeah. Needs some work. ... I think your difficulty is not seeing the three little configuration options surts-source-file, seeds-as-surt-prefixes, and...
stack
stackarchiveorg
Offline Send Email
Feb 2, 2005
1:42 am
1470
... [...] Good lord, I completely missed these. And here I was thinking that I had scanned all of the entries on the page. Even not moving them in the WUI but...
Tom Emerson
tree02139
Offline Send Email
Feb 2, 2005
4:40 am
1471
OK, so I'm still a bit hosed because my surt prefix (meant to mimic path scope) prevents the site's robots.txt file from being read, and then I get a ream of...
Tom Emerson
tree02139
Offline Send Email
Feb 2, 2005
5:10 am
1472
Hi, I am new to Heritrix and tried to run a first crawl job, but I get the following error log: java.net.ConnectException: Connection refused at...
innfang
Offline Send Email
Feb 2, 2005
5:43 am
1473
... You might consider broadening the key to accomodate timestamp or you might put your timestamp in place of the key tail ordinal of 64 bits (You may want to...
Michael Stack
stackarchiveorg
Offline Send Email
Feb 2, 2005
5:50 am
1474
... Does the server you are trying to connect to exist? Can you get there with a browser (From machine you're crawling from?)? Yours, St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Feb 2, 2005
6:00 am
1475
... Sorry about that Tom. I added a note under 'surtprefixscope' that says: 'When you use this scope, it adds 3 hard-to-find-in-the-UI attributes -- ...
Michael Stack
stackarchiveorg
Offline Send Email
Feb 2, 2005
6:06 am
1476
Tom Emerson wrote: ... A commit I made earlier today was supposed to avoid your seeing this exception. Did you update recently? If not, try setting your ...
Michael Stack
stackarchiveorg
Offline Send Email
Feb 2, 2005
6:12 am
1477
... I updated right after your note saying you fixed this, but the change must not have percolated to the anonymous server. Once I made the 'max-length-bytes'...
Tom Emerson
tree02139
Offline Send Email
Feb 2, 2005
6:29 am
1478
... Indeed, a path scope crawl using the same seed has crawled over 1600 documents so far and is only "23%" complete. Shouldn't the surt pattern provide the...
Tom Emerson
tree02139
Offline Send Email
Feb 2, 2005
6:55 am
1479
... To a point, yes, but a repeating Frontier may be interested in rediscovered URIs. I.e. if a new or changed page embedds another document, we may want to ...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Feb 2, 2005
8:24 am
1480
... Would it be difficult to have the WUI check that you enter a valid user agent and from string? Or is the check too complex to put in a place like that?...
Lars Clausen
lrclause
Offline Send Email
Feb 2, 2005
11:04 am
1481
I belive that the WUI prints a red star next to the setting if it is invalid. Maybe it should be more forceful, but at the time the functionality was added, it...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Feb 2, 2005
12:28 pm
1482
Hi, Thanks for the reply. I just realized that I have to specify the proxy host and port. May I know where I should specify it? Inn Fang ... get ... Source) ...
Inn Fang
innfang
Offline Send Email
Feb 2, 2005
1:17 pm
Messages 1453 - 1482 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help