Search the web
Sign In
New User? Sign Up
jena-dev · Jena Developers
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Speed question   Message List  
Reply | Forward Message #21436 of 42116 |
Hello,

I am a new Jena developer. I have a probably simple speed question.

Below is a very simple Jena/ARQ program. It is a toy standalone
program that builds a database of 10,000 trees describing families
(dad, mom, kids...), and then does various queries.

The queries are very simple (e.g. families where dad=="Peter"), but
the program runs *very* slowly. Typical queries take 40 seconds on my
AMD 64 box.

Is there something simple I should be doing (or not doing) to make
this run at a reasonable speed?

Thank you
Peter Wolf

import java.util.ArrayList;
import java.util.List;

import com.hp.hpl.jena.graph.Graph;
import com.hp.hpl.jena.graph.Node;
import com.hp.hpl.jena.graph.query.StageElement;
import com.hp.hpl.jena.n3.N3Exception;
import com.hp.hpl.jena.query.QueryExecution;
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.QueryParseException;
import com.hp.hpl.jena.query.QuerySolution;
import com.hp.hpl.jena.query.ResultSet;
import com.hp.hpl.jena.query.ResultSetFormatter;
import com.hp.hpl.jena.rdf.model.AnonId;
import com.hp.hpl.jena.rdf.model.Literal;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.ModelGraphInterface;
import com.hp.hpl.jena.rdf.model.Property;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Resource;
import com.hp.hpl.jena.rdf.model.ResourceFactory;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.itasoftware.ndl.know.KnowledgeGraph;
import com.itasoftware.ndl.know.graph.GraphException;
import com.itasoftware.ndl.know.query.Binding;

public class JenaSpeedTest {

/**
* neighborhood --> family *
*
* family --> id, dad, mom, child *, pet *
*
* dad --> person mom --> person child --> person pet --> animal
*
* person --> name age hairColor animal --> name age breed
*/

Model model;
Property id;
Property dad;
Property mom;
Property child;
Property pet;
Property name;
Property age;
Property hair;
Property breed;

int treeCount = 10000;

void printProperty(Property p) {
System.out.println(p + " = " + p.getNameSpace() + " + "
+ p.getLocalName());
}

JenaSpeedTest() {

model = ModelFactory.createDefaultModel();

id = model.createProperty("http://id");
dad = model.createProperty("http://dad");
mom = model.createProperty("http://mom");
child = model.createProperty("http://child");
pet = model.createProperty("http://pet");
name = model.createProperty("http://name");
age = model.createProperty("http://age");
hair = model.createProperty("http://hair");
breed = model.createProperty("http://breed");

//printProperty(pet);

long before = System.currentTimeMillis();

for (int i = 0; i < treeCount; i++)
addFamily(i);

long then = System.currentTimeMillis();

// model.write( System.out,"N-TRIPLE" ); //"RDF/XML-ABBREV" );
// System.out.println(model);

System.out.println((then - before) + " ms to create " + treeCount
+ " trees");

timeQuery("SELECT ?id WHERE { ?family <http://id> ?id . "
+ "?family <http://dad> [ <http://name> \"Peter\" ]. " + "}");

timeQuery("SELECT ?id WHERE { ?family <http://id> ?id . "
+ "?family <http://dad> [ <http://name> \"Peter\" ]. "
+ "?family <http://mom> [ <http://name> \"Robin\" ]. " + "}");

timeQuery("SELECT ?id WHERE { ?family <http://id> ?id . "
+ "?family <http://dad> [ <http://name> \"Peter\" ]. "
+ "?family <http://mom> [ <http://name> \"Robin\" ]. "
+ "?family <http://pet> [ <http://name> \"Toller\" ]. " + "}");

timeQuery("SELECT ?id WHERE { ?family <http://id> ?id . "
+ "?family <http://dad> [ <http://name> \"Peter\" ]. " + "}");

timeQuery("SELECT ?id WHERE { ?family <http://id> ?id . "
+ "?family <http://dad> [ <http://name> \"Peter\" ]. "
+ "?family <http://mom> [ <http://name> \"Robin\" ]. " + "}");

timeQuery("SELECT ?id WHERE { ?family <http://id> ?id . "
+ "?family <http://dad> [ <http://name> \"Peter\" ]. "
+ "?family <http://mom> [ <http://name> \"Robin\" ]. "
+ "?family <http://pet> [ <http://name> \"Toller\" ]. " + "}");
Node n;
}

void timeQuery(String query) {

System.out.println("Timing Query:\n" + query);

startTiming();
ResultSet results = doQuery(query);
stopTiming("doQuery");

ArrayList cache = new ArrayList();

startTiming();
while (results.hasNext()) {
cache.add(results.nextSolution());
}
stopTiming("nextSolution");

System.out.println(cache.size() + " results");
}

long then;
long now;

void startTiming() {
then = System.currentTimeMillis();
}

void stopTiming(String name) {
now = System.currentTimeMillis();
System.out.println(name + ": " +
(now - then) + " ms");
}

void addFamily(int i) {
Property family = model.createProperty("http://family" + i);
model.add(family, id, Integer.toString(i));
model.add(family, dad, addDad(i));
model.add(family, mom, addMom(i));
for (int j = 0; j < random(5); j++) {
model.add(family, child, addChild(i, j));
}
for (int j = 0; j < random(5); j++) {
model.add(family, pet, addPet(i, j));
}
}

String[] dads = { "Peter", "Justin", "John", "Frank", "Jeremy" };
String[] moms = { "Robin", "Eleanor", "Liz", "Amy", "Leslie" };
String[] children = { "Jeremy", "Eleanor", "Liam", "Catherine", "Liam",
"Susan" };
String[] hairs = { "Black", "Blond", "Brown", "Red", "Bald" };
String[] breeds = { "Toller", "Beagle", "Mutt", "Tabby", "Siamese",
"Lynx" };

int random(int n) {
return (int) (Math.random() * n);
}

String random(String[] names) {
int i = random(names.length);
return names[i];
}

Property addDad(int i) {
Property pa = model.createProperty("http://dad" + i);
model.add(pa, name, random(dads));
model.add(pa, hair, random(hairs));
model.add(pa, age, 30 + random(30));
return pa;
}

Property addMom(int i) {
Property ma = model.createProperty("http://mom" + i);
model.add(ma, name, random(moms));
model.add(ma, hair, random(hairs));
model.add(ma, age, 30 + random(30));
return ma;
}

Property addChild(int i, int j) {
Property ta = model.createProperty("http://child" + i + "x" + j);
model.add(ta, name, random(children));
model.add(ta, hair, random(hairs));
model.add(ta, age, random(20));
return ta;
}

Property addPet(int i, int j) {
Property ta = model.createProperty("http://pet" + i + "x" + j);
model.add(ta, breed, random(breeds));
model.add(ta, age, random(15));
return ta;
}

private ResultSet doQuery(String query) {
QueryExecution qexec = null;
try {
qexec = QueryExecutionFactory.create(query, model);
return qexec.execSelect();
} catch (QueryParseException e) {
throw new Error("Problem parsing query:\n" + query, e);
} catch (Exception e) {
throw new Error(
"Something went wrong while executing SELECT query", e);
}
}

public static void main(String[] args) {
new JenaSpeedTest();
}

}







Wed Mar 8, 2006 3:47 pm

e7_edim7_am6_e
Offline Offline
Send Email Send Email

Forward
Message #21436 of 42116 |
Expand Messages Author Sort by Date

Hello, I am a new Jena developer. I have a probably simple speed question. Below is a very simple Jena/ARQ program. It is a toy standalone program that...
e7_edim7_am6_e
Offline Send Email
Mar 8, 2006
3:47 pm

... What version of Jena are you running? -- Chris "sparqling" Dollin "Who do you serve, and who do you trust?"...
Chris Dollin
anover_alias
Offline Send Email
Mar 8, 2006
4:08 pm

... Jena 2.3...
e7_edim7_am6_e
Offline Send Email
Mar 8, 2006
6:24 pm

... Put the more specific part of the query first; it makes a significant difference. For example I rewrote the one above to: timeQuery( "SELECT ?id WHERE {...
Chris Dollin
anover_alias
Offline Send Email
Mar 8, 2006
4:26 pm

Wow Chris, thank you very much. My time went from 33000ms --> 150ms. I am guessing that the internals of ARQ run each part of the query in series, and apply...
e7_edim7_am6_e
Offline Send Email
Mar 8, 2006
6:33 pm

... About. I believe that ARQ delegates the "easy" parts of the query to the Jena graph query mechanism and combines the results. "Easy" is a bunch of triple...
Chris Dollin
anover_alias
Offline Send Email
Mar 9, 2006
8:49 am

Is there work going on to automatically optimize queries? I would think it should be possible to often automatically improve the peformance of queries by...
e7_edim7_am6_e
Offline Send Email
Mar 9, 2006
3:10 pm

... Not at present. ... Yes. But doing so is non-trivial and we have other priorities at the moment. ... SimpleTreeQueryPlan is a relic of an experiment that...
Chris Dollin
anover_alias
Offline Send Email
Mar 9, 2006
4:02 pm

Hello again, This email is a follow-on to my earlier thread finishing with 21470. I have been performing some speed experiments on Jena, and I have some ...
e7_edim7_am6_e
Offline Send Email
Mar 24, 2006
10:56 pm

A reply now - and hopefully Chris will see this. ... Thanks for the information. Your queries are (in SPARQL terms) a single basic graph pattern - Chris can ...
Seaborne, Andy
andyseaborne
Offline Send Email
Mar 25, 2006
2:49 pm

... Yes, that is surprising - I'd like to have the test code and data just to check that this is how it works locally. <fx:doubleTake/> OK, I see it. I'll try...
Chris Dollin
anover_alias
Offline Send Email
Mar 27, 2006
1:56 pm

Thanks Andy. ... stream for you ... Jena) ... time unless ... overhead for ... I'm not sure I understand this point. Will these factors affect the speed of...
e7_edim7_am6_e
Offline Send Email
Mar 27, 2006
3:38 pm

... triple pattern order does not affect the number of solutions returned but it can affect number of wrong answers : that why putting the most selective ...
Seaborne, Andy
andyseaborne
Offline Send Email
Mar 27, 2006
8:08 pm

On Friday 24 March 2006 22:56, e7_edim7_am6_e wrote: (Stuff) OK, I pulled your source into Eclipse and ran it and got the following output (I added one blank...
Chris Dollin
anover_alias
Offline Send Email
Mar 27, 2006
2:10 pm

Yes, I think so... here is my run. Please note that I am Linux using an AMD 64 processor. Note: please ignore "Very Fast". It is not an equivalent query that...
e7_edim7_am6_e
Offline Send Email
Mar 27, 2006
3:02 pm

Wait a minute... I screwed up. The runs are different! I get this... ... And you get this......
e7_edim7_am6_e
Offline Send Email
Mar 27, 2006
5:11 pm

...and I just saw that ARQ 1.2 is not the latest and greatest. Furthermore, I just installed 1.3 and it fixed my problem (see below). It seems that ARQ 1.3...
e7_edim7_am6_e
Offline Send Email
Mar 27, 2006
5:25 pm

... Good - but I can't claim it's ARQ - may be ARQ 1.3 picked up an optimized version of Jena so the combination of triple order (minor benefit) and the Jena...
Seaborne, Andy
andyseaborne
Offline Send Email
Mar 27, 2006
8:11 pm

Andy, thanks for the great support. I'm pretty sure its ARQ because I only replaced arq-1.2.jar with arq.jar (from 1.3). I did not touch Jena.jar. In any...
e7_edim7_am6_e
Offline Send Email
Mar 28, 2006
3:28 pm

... If odd things start happening (e.g. Java Errors about methods not found), do try the the jena.jar from the ARQ distribution because ARQ is tested and...
Seaborne, Andy
andyseaborne
Offline Send Email
Mar 28, 2006
3:40 pm

... Yes, that's right. ARQ (the general purpose query engine at least) passes each basic graph pattern (sequence of triple patterns) to Jena's graph query...
Seaborne, Andy
andyseaborne
Offline Send Email
Mar 9, 2006
10:04 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help