sanjayvenkat2000 wrote:
> I have an application where I need to update some resources in a dataset
(MasterModel). The information for the updates comes from another model
(UpdateModel) .
>
> I would like to get your opinion on how best to proceed. Data is stored using
TDB named graphs.
>
> Plan A:
> Use the Jena API to iterate over Resources in the UpdateModel.
> For each Resource, identify properties in the UpdateModel.
> Get MasterModel resource and delete identified properties.
> Add identified properties from UpdateModel into MasterModel.
>
> Plan B:
> Construct a string containing a SPARUL by listingStatements in the model.
>
> DELETE GRAPH <MasterModel> {
> <UpdateModel:uri> <UpdateModel:prop> ?any
> ...
> ...
> }
>
> INSERT GRAPH <MasterModel> {
> <UpdateModel:uri> <UpdateModel:prop> <UpdateModel:val>
> ...
> ...
> }
>
>
> I see Plan B getting into trouble if the UpdateModel is large as I will
unnecessarily create a query string that will get parsed again.
Normally the cost of disk accesses involved in the update will swamp
everything else and the cost of parsing the SPARUL is very unlikely to
be a bottleneck.
> With Plan A, I fear leaving hanging blank nodes around.
You have to account for bNodes in either case. If you have bNodes then
you'll need to scan the to-be-deleted data to check for bNodes,
reference count and add them to the deletion list. In that case I'd be
inclined towards plan A since you are scanning anyway.
If you do have blank nodes but are guaranteed they are only tree
structured then you simply need to scan the object values of the
to-be-deleted properties and for any that are bNodes scan all their
properties deleting them and recursively checking for bNodes.
If you are not guaranteed that the bNodes are tree structured but they
are only lattices with no cycles then you can use reference counting to
decide when to delete. In the most general case reference counting isn't
sufficient.
> I have noticed model.difference() but not sure which way to proceed.
Model.difference creates an in-memory copy of your base model, less the
model being subtracted, not relevant in your case.
Dave