There is a lot of misinformation out there when it comes to Property Graphs versus Semantic/RDF Graphs. Perhaps chief among them is the claim that RDF doesn’t support properties. This quick blog is to show you that you can indeed model your RDF graph in such a way as to provide that functionality in an elegant way without jumping through cartwheels.
Note: this blog post is not about reification (see: https://jena.apache.org/documentation/notes/reification.html). I find traditional reification as typically implemented in RDF to be inelegant and inefficient with regards to data storage. Apparently the Neo4J guys do too and they’ve used the inelegance of reification to lambaste RDF as a whole. Instead I’m going to show you what many in the industry believe is a better, more efficient way to model adding properties to your RDF project.
For starters, let’s recall that in RDF a triple consisting of “Subject, Predicate, Object” is commonly known as a STATEMENT. e.g. Michael Likes Semantics or Michael Speaks English. As such, our RDF triple store can be thought of as a large collection of statements. In most knowledge systems it is useful to be able to add citations (or other properties) to any statement such that the consumer of the statement can optionally have a means to ascertain the quality of the statement. A good example of this is Wikipedia — without citations, the content remains of unknown provenance and validity. Such a feature would be especially important in an Enterprise Knowledge Graph that aggregates its data from various silos throughout the organization.
Another piece of the puzzle is to recognize that the majority of commercial RDF “triple stores” are in fact actually “quad stores” (Oracle Spatial & Graph among them). So what is this 4th column used for? It was originally envisioned to allow for partitioning your triple-store into sub-graphs, the W3C standard commonly known as “named graphs” .
Breaking somewhat with tradition, we will be leveraging this 4th (named graph) column to add citations (a.k.a. properties) to our triple statement.
Subject (URI) | Predicate (URI) | Object (URI or label) | NamedGraph (URI) |
---|---|---|---|
[MySubject] | [MyPredicate] | MyObject | [MyStatement] |
[MyStatement] | [Key1] | Value1 | |
[MyStatement] | [Key2] | Value2 | |
[MyStatement] | [Key3] | Value3 |
Where [ ] above represents a URI. Note that Subjects, Predicates, and Named Graphs must be URIs. Whereas Objects can be either URIs or Labels.
In Oracle’s implementation, if you do not specify a FROM or FROM NAMED in your SPARQL query, the default graph contains the union of all named graph triples and all unnamed graph triples (unless you turn this off with a specific option through SEM_MATCH or Jena). This means that you can just run your queries as if everything were in the default graph. More information here: https://docs.oracle.com/en/database/oracle/oracle-database/19/rdfrm/rdf-semantic-graph-overview.html#GUID-45654D98-A2B5-4815-949D-2F48FA66DA51
More compact – uses less storage than other approaches
More efficient way for adding properties to triples – i.e. two-way join vs. three-way join
Doesn’t affect entailment
Makes it straightforward to exchange data to/from property graphs
In Oracle’s implementation, doesn’t require changes to your SPARQL
One Caveat:
If your application makes use of or depends on Named Graphs for creating sub-graphs, then this approach will not work for you.
RELATED: