RDF Datasets, Named and Default Graphs Concepts and How to Query Them.

May 5, 2020 | 5 minute read
Emma Thomas
Principal Solutions Architect
Text Size 100%:

As seen in the w3c recommendation https://www.w3.org/TR/rdf11-concepts/ 

“The Resource Description Framework (RDF) is a framework for representing information in the Web. This document defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications. The abstract syntax has two key data structures: RDF graphs are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, or datatyped literals. They are used to express descriptions of resources. RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs. …”

We will look at what these terms mean in closer detail and how they work inside the Oracle RDF graph database.

 

RDF Datasets

As a fundamental concept, RDF graphs are sets of triples (hence RDF databases are often referred to as triplestores). With RDF using concepts such as semantics and URIs these graphs are commonly linked together with other graphs from multiple sources to explore relationships across data. It is however often needed to work with multiple RDF graphs inside the same model. A set of RDF graphs within the same model is referred to as an RDF dataset.

An RDF dataset is a collection of RDF graphs, within the same model.
There are many uses for RDF datasets one suggestion from W3C is to possible to use this concept to hold snapshots of multiple RDF sources (https://www.w3.org/TR/rdf11-concepts/#dfn-rdf-source).


RDF Named Graphs

Inside an RDF dataset all but one of the graphs are considered named graphs. They have an associated IRI or blank node (https://en.wikipedia.org/wiki/Blank_node). There can be one or more named graphs inside your model.

 

RDF Default/Unnamed Graph

Inside an RDF dataset you can have one graph which is unnamed. This graph does not have an associated IRI and is referred to as the default graph. There can only be one unnamed default graph.

 

In summary:  An RDF Dataset is a collection of RDF graphs which:
    • Contains one default graph, which does not have a name
    • Contains zero or more named graphs, where each graph is identified by an IRI or blank node.

 

Querying Named Graphs

A SPARQL query is executed against an RDF Dataset, so this includes all named graphs and the default graph. So this means that Graph patterns that appear inside a GRAPH clause are matched against the set of named graphs, and graph patterns that do not appear inside a graph clause are matched against the default graph.
The FROM and FROM NAMED keywords are used to construct the RDF Dataset for a query. 
A query is executed against an RDF Dataset.
The graphs and named_graphs SEM_MATCH parameters are used to construct the default graph and set of named graphs for a given SEM_MATCH query. A summary of possible dataset configurations is shown in below.


Example Constructing the Dataset

Here we have one default unnamed graph and four named graphs {g1,g2,g3,g4} containing triples {t1,..t11}. From the select query in the centre we can assume that:
    • In the query the light grey “FROM” only statements ( FROM <urn:g1> and FROM <urn:g3>) will be considered in the default graph during execution time only. This will be { t4, t5, t8, t9 }
    • In red, graphs with the “FROM NAMED”  statement will be considered as named graphs during the query execution time. { (<urn:g2>, { t6, t7 }),   (<urn:g3>, { t8, t9 }),   (<urn:g4>, { t10, t11 }) }
*NOTE: the default graph in this instance is when running a query is really just used at the query time


Using the GRAPH Keyword and Basic Graph Patterns (BGP)

Active Graph

The “active graph” is the term referred to for the collection of graphs used during the life a query. The GRAPH keyword is used to control the active graph for different parts of a query.

Basic Graph Pattern 1  BGP1

Here there is no GRAPH statement mentioned, query is only ran against graph g1 and g3, which for duration of this query are considered to be in the default graph. It takes the union of g1 and g3. { <urn:g1> UNION <urn:g3> } which is equivalent to these triples being used in the BGP: { t4, t5, t8, t9 }.

Basic Graph Pattern 2 BGP2

This query is using the GRAPH clause it will use all graphs in with “FROM NAMED” clauses { <urn:g2>, <urn:g3>, <urn:g4> } which is equivalent to these triples being used in the BGP: { t6, t7,  t8, t9, t10, t11 }.
Note within the GRAPH clause BGP is executed against each active graph separately (e.g. BGP2 against g2, g3, g4) and sub graph match must occur within a single graph.


Basic Graph Pattern 3 BGP3

This query is using the GRAPH clause it will use only graph <urn:g4> for the query. This means that only these triples being used in the BGP: { t10, t11 }.


Basic Graph Pattern 4 BGP4

Here there will be no triples/graphs looked at when running the query as GRAPH clause is used and g1 is not in the FROM NAMED in the graphs listed at the beginning of the query.

In Summary

Graph patterns that appear inside a GRAPH clause are matched against the set of named graphs, and graph patterns that do not appear inside a graph clause are matched against the default graph during execution time.

 

Querying only query the Default graph

If you wish to only query the triples which are unnamed triples that appear in the default graph use options “STRICT_DEFAULT=T” on your pl/sql call ( not in your SPARQL) . This flag restricts the default graph to unnamed triples when no dataset information is specified.
The options attribute identifies options that can affect the results of queries. Options are expressed as keyword-value pairs. 


    

Further Reading

1.6.2.1 GRAPH Keyword Support

Intro to Graphs at Oracle

Emma Thomas

Principal Solutions Architect


Previous Post

Load Balancer Design

Catalin Andrei | 3 min read

Next Post


Introduction to OKIT the OCI Designer Toolkit

Andrew Hopkinson | 3 min read