Hi Marcel,
On Sat, Jun 17, 2006 at 09:58:53PM -0000, marcelferrante wrote:
> Hi everybody. I'm new in this maillist so be patient if I am repeting
> anwsers.
>
> 1 - Beyond the Berkeley DB is there some other data structure used for
> manipulate a very big repository of rdf graphs ?
> a) In information retrieval area is used inverted files and suffix
> arrays. Who groups are studyind that ?
we use inverted files (Lucene) for indexing the strings in an RDF graph.
For the index on the triples/quads, I don't immediately see how
inverted files could be used here.
> 2 - There are other groups studyind RDF persistent storage. How the
> YARS Group manager the relationship with them? (Sesame, RDF store,
> RDFdb, RAP, JENA)
We use B+ trees for a complete index on the graph topology; Redland uses
hash tables for the most common access patterns; Jena and Sesame use
RDBMS as backend storage systems; Sesame also has a native B tree
implementation; Kowari uses AVL trees.
In my experience, Sesame's native B tree implementation is quite
scalable. However, BerkeleyDB has sophisticated caching and locking,
and should be more appropriate in a multi-threaded environment.
I don't have any experience with Kowari, but it might be worth a try.
> 3 - What is the start of art for RDF persistent storage for very very
> big rdf graphs repository in your opinion?
Our version of DBLP has more than 11 million triples, and query response
times are well below 2 sec.
What's very very big rdf graphs? What datasets are you looking at?
Regards,
Andreas.