Tag Archives: sparql

How do you argue a query is impossible in SPARQL?

I’ve come to believe that I’m stumped in querying the existence of (s,t)-paths with SPARQL 1.0 or the k nearest neighbors with SPARQL 1.1 because of limitations of the query language, not limitations in my query-writing ability. But how to prove it? Or even provide some evidence? Tell me on cstheory.stackexachange.

Maybe this paper is relevant: Semantics and Complexity of SPARQL

Comments Off on How do you argue a query is impossible in SPARQL?

Filed under TCS

k-NN in SPARQL?

Is a SPARQL query capable of finding the k nearest neighbors for several vectors simultaneously? I don’t think so, but I’ve been wrong before. Tell me on Stack Overflow.

Comments Off on k-NN in SPARQL?

Filed under machine learning

Power of SPARQL?

Is a SPARQL query computationally powerful enough to test (s,t)-connectivity? I don’t think so, but I don’t understand the mysterious PropertyPath, and even without it, I’m not sure. Tell me on Stack Overflow.

1 Comment

Filed under combinatorial optimization

Counting triangles with SPARQL vs NetworkX

Before we begin, I should disclose: I am not a believer in the semantic web. I am not excited by the promise of linked data.

As I mentioned when I was getting started with SPARQL a few posts ago, I’ve been convinced to give this a try because Cray, Inc. has a new type of computer and this may be the lowest-overhead way for me to use it for global health metrics.

With that out of the way, here is one way that I may test drive their machine: counting triangles in massive graphs. I’ll abbreviate the introduction, counting triangles is an area that there has been a fair amount of work on in the last decade. Google scholar can get you more up-to-date than I, although I was looking into this matter towards the end of my post-doc at Microsoft Research. It is a good simplification of a more general subgraph counting challenge, and it can probably be justified in its own right as a metric of “cohesion” in social networks.

Another appealing aspect of triangle counting is that it is easily done with the Python NetworkX package:

import networkx as nx
G = nx.random_graphs.barabasi_albert_graph(10000, 5)
print 'Top ten triangles per vertex:',
print sorted(nx.triangles(G).values(), reverse=True)[:10]

It is not as easy, but also not much harder to count triangles per vertex in SPARQL (once you figure out how to transfer a graph from Python to a SPARQL server):

SELECT ?s (COUNT(?o) as ?hist)
WHERE { ?s  ?p  ?o . 
        ?o  ?p  ?oo .
        ?s  ?p  ?oo .
      }
GROUP BY ?s
ORDER BY DESC(?hist) LIMIT 10

I compared the speed of these approaches for a range of graph sizes, but just using the Jena Fuseki server for the SPARQL queries. Presumably, the Cray uRiKa will be much faster. I look forward to finding out!

time_ratio

NetworkX is faster than Fuseki, 2-4x faster. But more important is the next plot, showing that both seem to take time super-linear in instance size, possibly with different exponents:
time_2

Comments Off on Counting triangles with SPARQL vs NetworkX

Filed under machine learning

Before getting started with the Semantic Web

I mentioned the websearching difficulty I found when getting started with Semantic Web recently, but there was one good lead I found: an O’Reilly book called Learning SPARQL, and associated blog by author Bob DuCharme. I was particularly interested in an essay on the culture gap between Semantic Web and Big Data.

I can’t believe I just said I’m particularly interested in an essay about databases!

Comments Off on Before getting started with the Semantic Web

Filed under machine learning

Getting started with the Semantic Web

I’ve been getting started with a new project, for which I need to get up to speed on this whole semantic web/linked data business. I was as let down by the results of my websearching as I was elevated by the tagged-and-up-voted material on Stack Overflow. Here is a little link library:

Why am I doing this? Because supercomputer company Cray, Inc. has built a new type of supercomputer which is optimized for graph searching, and searching RDF with SPARQL is a low-overhead way to use it. And they are running a contest for scientists to do something interesting with their new tool, in which I am a contestant.

3 Comments

Filed under machine learning