We've been looking at this issue a while, and our preferred option
would be to use something like ICE to allow "crawlers" to ask us
what's changed since their last check, and we can simply send them a
list of all changed data. Of course, this requires the crawlers to do
more work, but in return I'd give them nice, consistent, easy to
process XML data. Since all of our data is in a database (i.e. we can
easily know what's changed, and ship only changes), and we run ICE,
this is a an extremely easy and efficient combination. The motivation
isn't just to keep load off of our servers and database, it's more
drive by a desire to provide timely and accurate information to users
even if they're on other sites.
Give that the web spiders that we've talked to are, oddly enough, more
comfortable processing HTML than XML (least common denominator -- it's
more work, and fragile, but it works everywhere and they already know
how to do it) it looks like it would be up to us as the owner of the
content to force the issue and block all spiders (that we know of) to
force them to use the mechanism that we prefer because it places
almost no load on our systems.
It hasn't come to the top of our lists of things to do (silly things
like "features" keep coming up) but it's on the list of things to do
some day.
--- In cms-vendor@egroups.com, "Jeff Barr" <jeff@v...> wrote:
> I've got two "bombardment" war stories on my EditThisPage
> site right now:
>
> http://jeffbarr.editthispage.com/discuss/msgReader$49
>
> I am still working to resolve both issues. The bad thing is
> that stuff like this happens. The good thing is that it is
> actually possible to find the responsible parties.
>
> Jeff;
>
> -----Original Message-----
> From: Peter Friedman [mailto:peter@c...]
> Sent: Wednesday, November 29, 2000 5:15 AM
> To: cms-vendor@egroups.com
> Subject: RE: [cms-vendor] Re: The problem, and why the "solution" may be
> hard
>
>
> Time to start putting together a 'crawler bombardment response' FAQ yet?
>
> -----Original Message-----
> From: Dave Winer [mailto:dave@u...]
> Sent: Tuesday, November 28, 2000 10:57 PM
> To: cms-vendor@egroups.com
> Subject: Re: [cms-vendor] Re: The problem, and why the "solution" may be
> hard
>
>
> We have many domains per IP address. That's what we'd like the
search engine
> guys to account for. Dave
>
>
>
> To unsubscribe from this group, send an email to:
> cms-vendor-unsubscribe@egroups.com
>
>
>
>
> To unsubscribe from this group, send an email to:
> cms-vendor-unsubscribe@egroups.com