Tag Archives: sparql

How do you argue a query is impossible in SPARQL?

I’ve come to believe that I’m stumped in querying the existence of (s,t)-paths with SPARQL 1.0 or the k nearest neighbors with SPARQL 1.1 because of limitations of the query language, not limitations in my query-writing ability. But how to prove it? Or even provide some evidence? Tell me on cstheory.stackexachange.

Maybe this paper is relevant: Semantics and Complexity of SPARQL

Comments Off

Filed under TCS

k-NN in SPARQL?

Is a SPARQL query capable of finding the k nearest neighbors for several vectors simultaneously? I don’t think so, but I’ve been wrong before. Tell me on Stack Overflow.

Comments Off

Filed under machine learning

Power of SPARQL?

Is a SPARQL query computationally powerful enough to test (s,t)-connectivity? I don’t think so, but I don’t understand the mysterious PropertyPath, and even without it, I’m not sure. Tell me on Stack Overflow.

1 Comment

Filed under combinatorial optimization

Counting triangles with SPARQL vs NetworkX

Before we begin, I should disclose: I am not a believer in the semantic web. I am not excited by the promise of linked data.

As I mentioned when I was getting started with SPARQL a few posts ago, I’ve been convinced to give this a try because Cray, Inc. has a new type of computer and this may be the lowest-overhead way for me to use it for global health metrics.

With that out of the way, here is one way that I may test drive their machine: counting triangles in massive graphs. I’ll abbreviate the introduction, counting triangles is an area that there has been a fair amount of work on in the last decade. Google scholar can get you more up-to-date than I, although I was looking into this matter towards the end of my post-doc at Microsoft Research. It is a good simplification of a more general subgraph counting challenge, and it can probably be justified in its own right as a metric of “cohesion” in social networks.

Another appealing aspect of triangle counting is that it is easily done with the Python NetworkX package:

import networkx as nx
G = nx.random_graphs.barabasi_albert_graph(10000, 5)
print 'Top ten triangles per vertex:',
print sorted(nx.triangles(G).values(), reverse=True)[:10]

It is not as easy, but also not much harder to count triangles per vertex in SPARQL (once you figure out how to transfer a graph from Python to a SPARQL server):

SELECT ?s (COUNT(?o) as ?hist)
WHERE { ?s  ?p  ?o . 
        ?o  ?p  ?oo .
        ?s  ?p  ?oo .
      }
GROUP BY ?s
ORDER BY DESC(?hist) LIMIT 10

I compared the speed of these approaches for a range of graph sizes, but just using the Jena Fuseki server for the SPARQL queries. Presumably, the Cray uRiKa will be much faster. I look forward to finding out!

time_ratio

NetworkX is faster than Fuseki, 2-4x faster. But more important is the next plot, showing that both seem to take time super-linear in instance size, possibly with different exponents:
time_2

Comments Off

Filed under machine learning

Before getting started with the Semantic Web

I mentioned the websearching difficulty I found when getting started with Semantic Web recently, but there was one good lead I found: an O’Reilly book called Learning SPARQL, and associated blog by author Bob DuCharme. I was particularly interested in an essay on the culture gap between Semantic Web and Big Data.

I can’t believe I just said I’m particularly interested in an essay about databases!

Comments Off

Filed under machine learning

Getting started with the Semantic Web

I’ve been getting started with a new project, for which I need to get up to speed on this whole semantic web/linked data business. I was as let down by the results of my websearching as I was elevated by the tagged-and-up-voted material on Stack Overflow. Here is a little link library:

Why am I doing this? Because supercomputer company Cray, Inc. has built a new type of supercomputer which is optimized for graph searching, and searching RDF with SPARQL is a low-overhead way to use it. And they are running a contest for scientists to do something interesting with their new tool, in which I am a contestant.

3 Comments

Filed under machine learning