Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want your group to be featured on the Yahoo! Groups website? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 2 - 31 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
2
I've been looking at what crawlers have typically done, and considering what we'd like the new crawler to do. The following general outline -- in roughly valid...
Gordon Mohr
gojomo
Offline Send Email
Feb 10, 2003
7:19 pm
3
Gordon, I assume that the worker thread is doing synchronous I/O. We are not sure yet on what mode we have finalized, synchronous I/O or asynchronous I/O ? The...
G.B.Reddy
gbreddysoft
Offline Send Email
Feb 12, 2003
5:45 pm
4
Hi, Reddy! ... Yes, this outline most easily maps to blocking I/O. Per a discussion last Thursday, we'd initially like to get up and running with the familiar...
Gordon Mohr
gojomo
Offline Send Email
Feb 12, 2003
7:08 pm
5
We now have a project at SourceForge for hosting our source code; see the details below. I definitely want to use their CVS, and perhaps their bug/ ...
Gordon Mohr
gojomo
Offline Send Email
Feb 13, 2003
6:58 pm
6
From a number of sources, I've been hearing about tricky crawler situations -- misbehaving or malicious servers, endless domains, difficult-to-extract link...
Gordon Mohr
gojomo
Offline Send Email
Feb 18, 2003
8:05 pm
7
At our last design meeting, Raymie and I sketched an outline of crawler operation as a series of discrete stages connected by queues -- a style compatible with...
Gordon Mohr
gojomo
Offline Send Email
Feb 19, 2003
10:21 am
8
[cc'd to the archive-crawler@yahoogroups.com discussion list] These are all important matters to address -- and for most of these issues, I think there will be...
Gordon Mohr
gojomo
Offline Send Email
Feb 21, 2003
7:16 am
9
... I said "_not_ RAM" Gordon said "swappable strategies will be enabled, starting with a simple RAM-based approach to get the crawler testable for small...
Raymie Stata
rstata
Online Now Send Email
Feb 21, 2003
7:29 am
10
I don't think we can build the best mega-scale crawler until after we've built a really good, modular, efficient small-scale crawler. That's how the existing...
Gordon Mohr
gojomo
Offline Send Email
Feb 21, 2003
4:17 pm
11
got it. all cleared up today at the meeting, I think. good start! -brewster...
Brewster Kahle
brewsterkahle
Offline Send Email
Feb 21, 2003
11:04 pm
12
[CC'ing to archive-crawler@yahoogroups.com] ... This looks like a good first cut. I'm still working to improve my understanding of the best way to use the...
Gordon Mohr
gojomo
Offline Send Email
Feb 22, 2003
12:17 am
13
Gordon and Raymie, Here goes the proposal for the asynchronous DNS lookup API implementation. We shall implement a minimal resolver which is capable of sending...
G.B.Reddy
gbreddysoft
Offline Send Email
Feb 27, 2003
4:55 pm
14
At our kickoff engineering review meeting last friday, most discussion centered around understanding and clarifying the requirements document. Key areas...
Gordon Mohr
gojomo
Offline Send Email
Feb 28, 2003
6:11 pm
15
Sounds like a reasonable plan. By "local name server" do you mean something *very* local -- for example, a standard nameserver we run on the same machine? That...
Gordon Mohr
gojomo
Offline Send Email
Feb 28, 2003
9:13 pm
16
Yes, it is a local name server. It could also be remote. -Reddy ... From: Gordon Mohr To: archive-crawler@yahoogroups.com Cc: Raymie Stata ;...
G.B.Reddy
gbreddysoft
Offline Send Email
Mar 3, 2003
1:19 pm
17
Driven by our meeting with Raymie last Thursday, and refined by further analysis, here are some notes on our design directions. = STAGED CRAWLER DESIGN NOTES =...
Gordon Mohr
gojomo
Offline Send Email
Mar 5, 2003
10:29 pm
18
Gordon and Raymie, Below are the various stages and their design with the issues involved in the DNS Resolver and HTTP Client implementation. DNS History/Cache...
G.B.Reddy
gbreddysoft
Offline Send Email
Mar 6, 2003
5:52 pm
19
Patrick Eaton forwarded me a pair of staged HTTP client implementations which are part of the OceanStore project at Berkeley, and are essentially what are also...
Gordon Mohr
gojomo
Offline Send Email
Mar 7, 2003
1:40 am
20
I've just checked into Sourceforge CVS the module 'Anecdote', a first stab at a staged crawler. Right now it just sets up dummy printing stages, grabs a list...
Gordon Mohr
gojomo
Offline Send Email
Mar 7, 2003
2:11 am
21
More insight on the DNS stages. As stated in the design earlier, "DNS Querying Stage", "DNS Response Processing Stage" and "Timeout and Retry Handling Stage"...
G.B.Reddy
gbreddysoft
Offline Send Email
Mar 7, 2003
4:31 pm
22
Gordon, Igor, Raymie present. (1) Access to work in progress: start using SourceForge CVS (Post meeting note: 2 modules now exist there: 'Anecdote', a staged...
Gordon Mohr
gojomo
Offline Send Email
Mar 7, 2003
9:44 pm
23
I added very dumb HTTP fetching toe the Anecdote 'Fetching' stage via the Apache Commons HTTPClient library soon after my message yesterday. ... This spinning...
Gordon Mohr
gojomo
Offline Send Email
Mar 7, 2003
9:51 pm
24
These are good decompositions of the steps involved, and the LGPL dnsjava library looks very useful for our needs. My tendency would be to think fewer stages...
Gordon Mohr
gojomo
Offline Send Email
Mar 7, 2003
11:30 pm
25
Gordon, I am done with the asynchronous DNS code. I shall test it more tomorrow and checkin. I may start using the caching mechanism present in the dnsjava ...
G.B.Reddy
gbreddysoft
Offline Send Email
Mar 12, 2003
4:15 pm
26
Gordon, I have checked in the first version of the asynchronous DNS lookup stage (DNSLookingUp.java). I have also updated the README and the anecdote.cfg file...
G.B.Reddy
gbreddysoft
Offline Send Email
Mar 17, 2003
8:08 pm
27
I'll take a look. Don't feel obligated to go with Eclipse -- even though it is a very nice environment. Eventually we'll include versioned ant scripts with...
Gordon Mohr
gojomo
Offline Send Email
Mar 17, 2003
11:42 pm
28
Gordon, Yes, as you said dnsjava creates a new udpsocket for every message. I am planning to separate out the processing logic from the socket related code and...
G.B.Reddy
gbreddysoft
Offline Send Email
Mar 18, 2003
2:20 am
29
I'm trying out the 'libhttp' staged HTTP code we were passed by the Berkeley OceanStore project, and it requires all aspects of the outbound request to be...
Gordon Mohr
gojomo
Offline Send Email
Mar 19, 2003
7:38 pm
30 Gordon Mohr
gojomo
Offline Send Email
Mar 19, 2003
8:59 pm
31
As I understand it, the largest header Mercator will set is: GET /foo.html HTTP/1.0 User-Agent: Mercator-1.0 Host: foo.com From:...
Raymie Stata
rstata
Online Now Send Email
Mar 19, 2003
9:19 pm
Messages 2 - 31 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help