Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 1064 - 1093 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
1064
In this case, it's not the crawler that's sending any email, but the server-side script it's hitting, which is specified in the FORM ACTION attribute. For...
Gordon Mohr (Internet...
gojomo
Online Now Send Email
Oct 1, 2004
8:55 pm
1065
Last week we upgraded the heart of Heritrix, the HttpClient lib from release 2.0 to 3.0. Its 3.0 alpha2 but by all accounts, its stable, and thats been our...
stack
stack@...
Send Email
Oct 4, 2004
5:36 pm
1066
... Yes, I've been running my test crawl (the one that died and whose log I sent you) for the last 10 hours using an update against HEAD I made around 07h30...
Tom Emerson
tree02139
Offline Send Email
Oct 4, 2004
9:41 pm
1067
... I run a heritrix instance use BdbFrontier yesterday. It has 'Downloaded 1163658 documents in 23 h., 15 min. and 20 sec'. No KeyQueue naming errors now. ...
ansi
mymaillist@...
Send Email
Oct 6, 2004
4:18 am
1068
... Good stuff. We'll try not to break it again (smile). ... Thanks for the alerts Ansi. Looks like you're instance is using 'SingleHttpConnectionManager'...
stack
stack@...
Send Email
Oct 6, 2004
5:22 pm
1069
I have an ARC file (generated from an ongoing crawl using a recent HEAD snapshot) that causes the arcreader to die: Exception in thread "main"...
Tom Emerson
tree02139
Offline Send Email
Oct 8, 2004
7:57 pm
1070
Is there a specification for the SURT beyond the paragraph in the User's Guide? Could you add real examples to the manual? Thanks. -tree -- Tom Emerson...
Tom Emerson
tree02139
Offline Send Email
Oct 8, 2004
8:27 pm
1071
... Can I have your ARC Tom? Might give clue on how the records were constructed. Thanks, St.Ack P.S. You got note on questions for the 'archive pass'...
stack
stack@...
Send Email
Oct 8, 2004
9:00 pm
1072
... Gordon was talking up SURT at LoC this week (Library of Congress). I'll get him to add his notes into manual (You've seen the issue? It has some good...
stack
stack@...
Send Email
Oct 8, 2004
9:07 pm
1073
I get this Exception,too:( Ansi...
ansi
mymaillist@...
Send Email
Oct 9, 2004
1:00 am
1074
Is there a Maven goal I can use to build just enough to test Heritrix, including the Web GUI? By test I mean configure and submit a job, and run a short...
tztwh
Offline Send Email
Oct 11, 2004
4:20 pm
1075
... 'maven jar' will build the jar only. It does the unit tests, which you probably want, but not all of the other doc. generation. The jar gets created...
stack
stack@...
Send Email
Oct 11, 2004
6:24 pm
1076
I am interested in using the Heritrix extractors to pull links from HTML documents. The problem is that in addition to the links, I need to know the position...
mycourtjester
Offline Send Email
Oct 12, 2004
3:16 pm
1077
... Its not currently supported. To know the position of each link in a page, you'll need to doctor each of the extractors you're interested in to log the link...
stack
stack@...
Send Email
Oct 12, 2004
6:43 pm
1078
The link on the main heritrix page (http://crawler.archive.org/articles/user_manual.html) just brings up a blank page. I was hoping to find some documentation...
robeger
Online Now Send Email
Oct 13, 2004
3:07 pm
1079
Something probably went wrong with the auto generation during the most recent build. I'm sure Michael will fix it once he gets in. In the meantime you can use...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Oct 13, 2004
3:16 pm
1080
... Should be fixed by the time ye get this mail (Bad src xml). Thanks for pointing it out. ... Yeah, its a new feature as Kris says. Here's the little note...
stack
stack@...
Send Email
Oct 13, 2004
4:45 pm
1081
robeger writes: [...] ... What in particular would you like to know? I wrote the filter, so ask away. ;-) -tree -- Tom Emerson...
Tom Emerson
tree02139
Offline Send Email
Oct 13, 2004
4:48 pm
1082
I was looking at your notes on http://www.dreamersrealm.net/tree/blog/2004/08/19/#html_only about it. Sounds like what I want to do - just grab text content....
robeger
Online Now Send Email
Oct 13, 2004
5:12 pm
1083
Hi all, Are the crawl.log fields described somewhere? I figured it out myself and wrote my own doc by reading the code after not finding anything in the...
Tom Emerson
tree02139
Offline Send Email
Oct 13, 2004
5:34 pm
1084
... The user manual has a coarse description. See '8.2.1. crawl.log' in http://crawler.archive.org/articles/user_manual.html. It could be tightened up. Send...
stack
stack@...
Send Email
Oct 13, 2004
8:01 pm
1085
... The above sounds like a decent tactic. Leaving off the pre-fetch filter would mean that you'd do content-type checks only. Might be more suited to your...
stack
stack@...
Send Email
Oct 13, 2004
8:17 pm
1086
... Fixed in HEAD: https://sourceforge.net/tracker/index.php?func=detail&aid=1045736&group_id=73833&atid=539099. St.Ack...
stack
stack@...
Send Email
Oct 14, 2004
1:37 am
1087
... Of course when I went looking the User Manual wasn't available online yet. What I ended up with is pretty much what's there. I would find it more readable...
Tom Emerson
tree02139
Offline Send Email
Oct 14, 2004
3:36 pm
1088
... [...] ... I don't think you need the mid-fetch filter, but I may be missing something. ... Yes, one regexp will give you better performance. The one stack...
Tom Emerson
tree02139
Offline Send Email
Oct 14, 2004
3:42 pm
1089
... General plan is to build a meaty glossary and then mess with xinclude to duplicate the meaty snippets throughout the docs (Haven't gotten to the xinclude...
stack
stack@...
Send Email
Oct 14, 2004
4:16 pm
1090
I'm trying to wrap my head around the following observations about the seeds in a crawl I did. - The original seed list has 280 URLs. - The seed list after the...
Tom Emerson
tree02139
Offline Send Email
Oct 14, 2004
11:51 pm
1091
Hi Tom, First of all, you should definitely care about this kind of discrepancies. It is very important that all reports are accurate and that they make sense....
Igor Ranitovic
iranitovic
Offline Send Email
Oct 15, 2004
9:37 pm
1092
Hi, I got someone to install Heritrix on a machine for me. I just gave them the user manual link. Following the instructions in there they simply installed a...
Williamson, Mark
Mark.Williamson@...
Send Email
Oct 16, 2004
7:28 am
1093
... Thats interesting Mark. It works with full SDK? Make a bug and I'll fix the manual. This is 1.0.4? Is it running the selftest when this happens because...
stack
stack@...
Send Email
Oct 16, 2004
3:42 pm
Messages 1064 - 1093 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help