There have been numerous discussions about the proper frequency at
which to poll an RSS or Atom feed, and also many discussions about how
feeds might inform aggregators about their change frequency. The
general consensus seems to be that every half-hour is acceptable.
Now that many sites are offering APIs, I am curious as to what proper
behavior is for using them. Particularly under circumstances where one
aggregator is serving many users. Many sites require a registered key,
presumably so that abusive usage can be detected and banned. And some
sites limit the number of calls that can be made in a given period.
All of this makes sense. Sort of.
If I am a web-based aggregator, I can grab the same RSS feed once, and
serve it to all my users. They are all looking for the same thing.
However, if I allow my users to query an outside service through an
API, typically, they are all going to be asking for slightly different
data. I can't make one call to the API and serve the results to
everyone for the next half-hour. I can cache individual queries, but I
am generally going to making many more calls to the site than if I am
simply grabbing the feed.
What approach should be taken to limit abusive use of the API, yet
still make use of it's functionality in a meaningful way to end users?
I can't imagine asking each user to bring their own API keys. Most
wouldn't know what it is or where to get one.
This is not just a rhetorical question. My aggregator, fyuze.com,
allows users to tap into the APIs of other sites (Flickr,
Upcoming.org, Technorati, Amazon) and I want to have an open
discussion about how to harness these APIs to provide the user with
interesting data without being abusive of the services that provide is.
Justin Klubnik.