Dear All, In the past few days we had talked to several of you about providing data for the billion triples challenge. I would like to start a brief discussion...
Dear All, We are looking for persons or organizations who would like to offer their help in hosting the Billion Triples data set. This is an important part of...
My two cents: In the spirit of RDF, why not provide a 'directory' triple file that has resources identifying each file and provides timestamps, provenance etc...
Hi, I'm new to this discussion list. I will introduce myself, I'm Marc-Alexandre Nolin from the Bio2RDF project (http://bio2rdf.org). His the billions triples...
Hi, ... triples ... Turtle ... as already discussed, I'd prefer this solution. Filenames in the ZIP archive are the url-encoded URI of the file. Actually,...
... Jim & Peter, As we do with DBpedia[1][2], we are happy to be one of hopefully numerous RDF data store providers for this effort. Count OpenLink Software in...
What licensing terms will the data be issued under? I encourage this project to adopt the ODC Public Domain Dedication and Licence, a licence that Talis and...
Following some inquiries, i'd like to clarify that its not the main Sindice infrastructure providing a sparql endpoint (e.g. over the entire dataset), its just...
Ian, good point, we will work hard to make sure all the data is freely sharable and displayable, having a good license that makes that clear would make a lot...
Hi Andreas, I like this solution as well, the only thing I'm slightly worried about now is what happens when you unzip a large number of files. My extended ...
Hi Peter, ... from my experience, file systems will have trouble at some point when there are too many files around. Thus, we avoid writing individual files ...
Hi list, ... I quite like this last solution for one, very selfish reason: this is very similar to the way the cache of Watson is organized. For example, ...
Dear All, After some long and careful consideration, we have made the decision not to invent our own format for exchanging data but to rely on an existing ...
Hi Peter, I'm not entirely sure what you are going to give us access to. You (if everything goes right at Yahoo) will give us access to a 100 G crawl in...
Hi Jans, The plan is to have the entire dataset available for download in the WARC format as a set of files. (Some users may have limitations storing files...
Hello Peter, Do we have any codes written in Jena? - Amit ... the ... storing ... crawls. ... do if ... HTTP ... access ... access to a ... on ... an existing ...
Hi Amit, No, I don't as I'm not familiar with Jena. But basically the MeasurableInputStream that you get as a result of the response.contentAsStream() call on...
Thnx for the info. - Amit ... that you ... download in ... limitations ... of ... response. The ... need to ... the ... GB. ... us ... based ... the ... on ......
** our apologies if you receive multiple copies of this message ** ================================================================== CALL FOR PAPERS ESWC 2008...
All- Peter feels that we now have the collection and distribution of the triples underway, which means he gets to make me do some work finally... My role at...
Here's my views: Triple Store: the big problem with semantic web, no matter how big promises it makes, is the amount of triples that can be stored and dealt...
Dear Jim, dear all May I propose another measuring criteria or facet of the challenge: Can a user interactively do something useful with the data? I think that...
Hi All, The CFP and the dataset for the Billion Triples Challenge have been posted at [1]. Please let us know of any immediate problems you see with accessing...