X

Best Practices from Oracle Development's A‑Team

"Extending" an 
existing RDF 
Knowledge Graph

Michael J. Sullivan
Principal Cloud Solutions Architect

he basic idea behind an RDF knowledge graph is that A.) every domain-specific graph consists of a bunch of statements taking the form of a triple consisting of Subject, Predicate, Object. And B.) for the most part, that those domain-specific statements are maintained together in one graph model. So far so good. But what if we wanted to "extend" our knowledge graph to have some metadata about the metadata? Do we need to shove that into our graph model as well (and thereby clog it up)? Or could we somehow extend our knowledge graph by placing the supporting metadata in a secondary source? There are all sorts of use-cases for this, but perhaps the easiest to understand would be around concepts like provenance & governance. Specifically, the provenance supporting each domain-specific statement — i.e. Where did it come from? When was it added? Who added it? etc. This is not unlike a content management system wherein one has access to both the CMS data (e.g. the article text, images, categories, etc.) as well as the CMS metadata (e.g. created_by, created_date, workflow_status, etc.).

It is often desirable to separate out the data from the metadata, but also have it readily linkable/queryable in order to assist with managing your assets. However, this is much harder to do in standard RDF as there is no such thing as a primary key for each triple (something I think should have been there from the beginning). But that doesn't mean we can't try.

In a previous blog (see: https://www.ateam-oracle.com/semanticrdf-“properties”) I showed how we could leverage the NamedGraph feature of RDF Quads to add edge properties to our graph model. In this blog, we are going to do something similar, but instead of placing those edge properties WITHIN our model, we will be placing the triple-related properties OUTSIDE in their own separate model(s). Just like any other RDF endpoint, these properties are easily independently queryable via SPARQL. And when needed, we can can easily "join" them together with our original knowledge graph through the use of either SPARQL (locally overloaded) federation or better yet, as a combined virtual view unique to Oracle Spatial & Graph. This allows you to maintain multiple models independently of each other yet seamlessly combine & query them when needed without using federation — a pattern which can also be used for near instantaneous updates. Cool stuff.

What follows are two sample models where MYMODEL contains the primary data keyed using a NamedGraph URI while MYMODEL$PROVENANCE contains the related provenance metadata.

MYMODEL

Subject           Predicate            Object              NamedGraph
:_SEMANTiCS_2019  a                    :Conference
:_Michael         :presentedKeynoteAt  :_SEMANTiCS_2019    :E1
:E1               :location            :_Karlsruhe_DE

MYMODEL$PROVENANCE

Subject           Predicate            Object 
:E1               :dateCreated         "2018-04-09T10:00:00"^^xsd:dateTime
:E1               :createdBy           :_PublicRelationsTeam
:E1               :workflowStatus      "Under Review"

Here we are using :E1 somewhat like a primary key. However, note also that not every statement in MYMODEL likely requires provenance in this case (for example, :_SEMANTiCS_2019 in the above model) — only those key triples that need or could benefit from maintaining provenance due to regulatory compliance, governance, or anything else that adds to the enrichment of the Knowledge Graph.

Querying each model independently is straightforward as each is just a "ordinary" RDF graph — but what fun is that? To combine them into a single virtual view in Oracle you would first execute something like the following:

EXECUTE sem_apis.create_virtual_model('VM1', sem_models('MYMODEL', 'MYMODEL$PROVENANCE'), 
   network_owner=>'RDFUSER',
   network_name=>'NET1');

As a result, your new virtual VM1 model would now look like the following:

Subject           Predicate            Object              NamedGraph
:_SEMANTiCS_2019  a                    :Conference
:_Michael         :presentedKeynoteAt  :_SEMANTiCS_2019    :E1
:E1               :location            :_Karlsruhe_DE
:E1               :dateCreated         "2018-04-09T10:00:00"^^xsd:dateTime
:E1               :createdBy           :_PublicRelationsTeam
:E1               :workflowStatus      "Under Review"

Notice that as everything is effectively in one model, there is no need for federation, making querying much easier.

In summary, using a virtual model approach together with NamedGraph URIs to extend an existing knowledge graph can provide several benefits:

  • It can allow one to seamlessly extend an existing model without “clogging” it up
  • It can simplify management of access privileges for semantic data
  • It can facilitate rapid updates to semantic models
  • It can simplify query specification because querying a virtual model is equivalent to querying multiple models in a SEM_MATCH query

Note also that in this pattern you are free to make use of NamedGraphs in your extension models for any other purpose such as creating subgraphs of the provenance metadata. Happy modeling!

See Also:

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha