Hi,
The content inside the WARC is encoded in N-Triples, see the sample code
(added to the files of the Yahoo! Group, see [1]) on how to extract it.
Once you have the N-Triples you can as you say process them using any
library. The sample code shows how to count the triples using Sesame.
Best,
Peter
[1]
http://f1.grp.yahoofs.com/v1/gLOhSKL3vLBuPUqI5V-eqZBzl0sjZGF52nYvngkFkAg-JaeZUgr\
YYx75kRXm7qz0uSZZeTvYnzWCU2lfNJi0hA/WarcReader.java
--- In billiontriples@yahoogroups.com, "huanxuezhou" <huanxuezhou@...>
wrote:
>
> Thanks for explanation. Personally, I really appreciate if you can
> encode files in N-Triples format, since this format represents RDF
> well and can be easily parsed by jena.
>