I am thrilled to see you pursuing this, andrew! I know i won't have
time for at least a few days to really look at what you're doing, but
wanted to dump some thoughts i've acquired while accumulating the
Contributed.html/contrib catalogues on python.org.
First of all, not too long ago i made an effort to regularize the
_formatting_ of the http://www.python.org/python/Contributed.html
collection. I'm not sure whether it's airtight, but i tried to make it
possible to automatically parse the outermost structure of the entries
(particularly the <dd> part) to extract some fields for each of the
packages: URL, pkg_name, terse_descr, author/author_address(es),
followed by the long descr in the <dt> part. A few entries have sub
entries, which probably will call for human intervention. (I've been
hoping to have a moment to see how the local collection and links
collections would reconcile, but it never got important enough, sigh.)
There is also some categorical info about each entry to be gleaned,
according to (1) where they are embedded in the doc (sections), and
(2) the contents of an optional "see also" note that follows several
of the entries.
In the process of accumulating this stuff over time, i've come to
identify a few things that i think would be highly desirable.
0 Consolidating the info about the local collection and the links into
a single db!! (I figure everyone feels this way, but couldn't help
mention it.)
1 A mechanism for associating keywords with entries, and for finding
the items according to keywords. This would be a refinement of the
categorical approach i've simplistically taken with the uploaded
(directories) and listed (document sections) collections i've
accumulated at python.org.
I think the collection of keywords should be well defined but
extensible. Ie, contributors select from the existing set, but can
suggest additions when they see no alternative. However, all
additions would require approval, and the aim of the review would
be to admit only clear and necessary extensions - ie, only when
nothing existing will do, and the new keyword (or an better
alternative) looks like a real good fit. This would promote good
keywords for use by the people searching for software...
(The keywords at least could be used to provide the categorical
views that the contrib typed directories and html list sections
provide. In fact, i've often envisioned a CGI app which presents
items based on categorical keywords, doing dynamically what the
html page layout currently does statically and in one big bunch.)
2 A mechanism whereby users can register quantitative feedback about
the robustness and importance of items they particularly value, and
a corresponding mechanism for collating the feedback to pinpoint eg
must-have items (numpy for those doing math, ilu for those doing
distributed object, etc)
Well, this is a very rough dump off the top of my head. I hope it's
useful...
Ken Manheimer klm@... 703 620-8990 x268
(orporation for National Research |nitiatives
_______________
LOCATOR-SIG: Discussions about a Python Locator for resource discovery
send messages to: locator-sig@...
administrivia to: locator-sig-request@...
_______________