Jun 162014
 

Olaf Hartig has developed a formal model of the “Reification Done Right” concepts [1].  The model formalizes an extension to both RDF (known as RDF*) and SPARQL (known as SPARQL*).  These extensions define a backwards compatible relationship between the RDF data model and the SPARQL query language and an alternative perspective on RDF Reification. The RDF* and SPARQL* models are introduced and formally described in  Foundations of an Alternative Approach to Reification in RDF.

The key contributions of this paper are:

  • Formal extensions of the RDF data model and the SPARQL algebra that reconciles RDF Reification with statement-level metadata;
  • An extended syntax for TURTLE that permits easy interchange of statements about statements.
  • An extended syntax for SPARQL that make it easy to express queries and data for statements about statements.
  • Rewrite rules that may be used to translate RDF* into RDF and SPARQL* into SPARQL.

RDF* and SPARQL* allow statements to appear as Subjects or Objects in other statements.  Statements about these “inline” statements can be interpreted as if they were statements about statements.  The paper shows that this is equivalent to statements about reified RDF statement models. For example, the following statements declare a name for some resource “:bob”, an age for :bob, and provide assertions about how and where that age was obtained:

:bob foaf:name "Bob" .
<<:bob foaf:age 23>> dct:creator <http://example.com/crawlers#c1>
                     dct:source <http://example.net/homepage-listing.html> .

and then queried using:

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   <<?bob foaf:age ?age>> dct:source ?src .
}

In both cases the << >> notation denotes a statement appearing as the Subject or Object of another statement.  Further, statements may become bound to variables as shown in this alternative syntax:

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   BIND( <<?bob foaf:age ?age>> AS ?t ) .
   ?t dct:source ?src .
}

The paper proves that these examples are equivalent using RDF Reification. That is RDF Reification already gives us a mechanism to represent, interchange, and query statements about statements.  However, the paper also shows that statements about statements may be modeled and queried within the database in a wide variety of different physical schemas that allow great efficiency and data density when compared to naive indexing of RDF statement models.  This gives database designers enormous freedom in how they choose to represent those statements about statements and helps to counter the impression that RDF databases are necessarily bad for problems requiring link attributes.  For example, any of the following physical schema could be used to represent these statements about statements:

  • Explicitly model the statements about statements as reified RDF statement models;
  • Associating a “statement identifier” with each statement in the database and then using it to represent statements about statements;
  • Directly embed the statement “:bob foaf:age 23″ into the representation of each statement about that statement (inlining within the statement indices using variable length and recursively embedded encodings of the Subject and Object of a statement); and
  • Extending the (s,p,o) table to include additional columns, in this case dct:creator and dct:source.  This can be advantageous when some metadata predicate has a maximum cardinality of one and is used for most statements in the database (for example, this could be used to create an efficient bi-temporal database with statement-level metadata columns for the business-start-time, business-end-time, and transaction-time for each assertion).

By clarifying the formal semantics of RDF Reification and offering a simplified syntax for data interchange, query, and update, database designers and database users can now more easily and confidentially model domains that require statement level metadata.  There is a long list of such domains, including domains that model events, domains that require link attributes, sparse matrices, the property graph model, etc.

Bigdata supports RDF* and SPARQL* for the efficient interchange, query, and update of statements about statements.  Today, this is enabled to through the “SIDS” option

com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=true

This enables the historical mechanism for efficient statements about statements in bigdata.  In the future, we plan to add support for RDF* and SPARQL* to the quads mode of the platform as well.  This will allow statement level metadata to co-exist seamlessly with the named graphs model.

Thanks,
Bryan

[1] http://arxiv.org/abs/1406.3399