Proposed generic pattern to model ordered collections w/ RDF quads

June 11, 2020 | 13 minute read
Michael J. Sullivan
Principal Cloud Solutions Architect
Text Size 100%:

Let's face it, coding for recursive RDF lists is less than enjoyable. I'm convinced that fact alone is one of the key reasons graph developers have overwhelming turned to property graphs as perhaps 80% of what developers do is work with lists, arrays, and hierarchies. Not surprisingly, they don't want to deal with something that they feel "sucks".

To that end I herein propose a flexible, efficient, generic pattern to model ordered collections using the RDF quad mechanism mentioned in my last few posts. 

Note: an analogous pattern could also be implemented in either RDF* or RDF# as both extensions allow for edge properties. And while yes, reification could also be used, I'd rather stick needles in my eyes than suggest using reification! :P

Besides dealing with list ordering in a clean, flexible, and array-like manner (as opposed to recursively fetching prev and next), I also feel that collections should describe themselves so that they can be easily understood by a "controller" such as a GraphQL resolver. With that in mind, the idea is to introduce a sort of extensible triples "registry" which documents the edgeProperties as well as the objectProperties for any given collection. Where edgeProperties can be thought of as "structural" attributes of the collection (e.g. sort order, grouping, segmentation, sub levels, etc.) while objectProperties can be thought of as "content" attributes owned by the individual members of the collection (e.g. price, title, author, genre, etc.). Think of these as "hints" for the developer to use to generate the query as well as to know what will be returned. This allows the content-creator/taxonomist the flexibility to explicitly specify any property to render in the output payload on a per-collection basis. Additionally, collection members can be members of other collections as well with no restriction, each with their own unique set of properties to be output. 

Note: regarding the above, it might make sense to employ SHACL constraints on the members to be sure they have the attributes being requested. 

The basic collection RDF "schema" is shown both above as a graph as well as below in quad table format with notes:

Subject

Predicate

Object

NamedGraph

Notes

ex:MyCollection

rdf:type

list:List

 

MyCollection is a List

ex:MyCollection

list:hasMember

ex:book1

ex:ng1

Core triple “edge”

ex:MyCollection

list:hasMember

ex:book2

ex:ng2

Core triple “edge”

ex:MyCollection

list:hasMember

ex:book3

ex:ng3

Core triple “edge”

ex:MyCollection

list:hasMember

ex:book4

ex:ng4

Core triple “edge”

ex:MyCollection

list:edgeProperty

ex:primarySort

 

Optional edge-property query/rendering hint

ex:MyCollection

list:edgeProperty

ex:secondarySort

 

Optional edge-property query/rendering hint

ex:MyCollection

list:objectProperty

ex:hasTitle

 

Optional object-property query/rendering hint

ex:MyCollection

list:objectProperty

ex:hasPrice

 

Optional object-property query/rendering hint

ex:MyCollection

list:objectProperty

ex:hasGenre

 

Optional object-property query/rendering hint

 

As you can see, we use the collection to gather four book entities into a list via a list:hasMember predicate.

Note: we can also implement list:isMemberOf to allow children nodes to choose their parent collection, then use an inverse axiom to entail the corresponding list:hasMember. This would allow the same SPARQL query to be used for both parent/child and child/parent types of collections! 

For each member of the collection, we specify a NamedGraph (URI) which we will then use to attach edge-specific attributes as needed. In this particular example, we plan to specify two edge properties: ex:primarySort and ex:secondarySort which get populated as triples as shown below:

Subject

Predicate

Object

NamedGraph

Notes

ex:ng1

ex:primarySort

1

 

hasMember “edge” property

ex:ng2

ex:primarySort

2

 

hasMember “edge” property

ex:ng3

ex:primarySort

2

 

hasMember “edge” property

ex:ng4

ex:primarySort

3

 

hasMember “edge” property

ex:ng2

ex:secondarySort

1

 

hasMember “edge” property

ex:ng3

ex:secondarySort

2

 

hasMember “edge” property

 

The final bit that makes for a nice-looking payload coming back from the SPARQL query is to make sure that our classes have human-readable labels defined for each. Something like the following will do:

Subject

Predicate

Object

NamedGraph

Notes

ex:hasPrice

rdfs:label

Price

 

Label to be rendered in matrix payload (required)

ex:hasGenre

rdfs:label

Genre

 

Label to be rendered in matrix payload (required)

ex:hasTitle

rdfs:label

Title

 

Label to be rendered in matrix payload (required)

ex:primarySort

rdfs:label

_SORT_PRIMARY

 

Label to be rendered in matrix payload (required)

ex:secondarySort

rdfs:label

_SORT_SECONDARY

 

Label to be rendered in matrix payload (required)

 

Ok then, assuming each book has Title, Price, and Genre properties defined elsewhere, we have all the RDF data in place. The desired table payload we want coming back from the SPARQL query for our particular collection should look like the following:

_URI

_SORT_PRIMARY

_SORT_SECONDARY

Genre

Price

Title

ex:Book1

1

 

Metaphysics

$25.00

Zen and the art of motorcycle maintenance

ex:Book2

2

1

Science

$15.00

The Fractal Geometry of Nature

ex:Book3

2

2

Science

$20.00

A brief history of time

ex:Book4

3

 

Fiction

$30.00

Hound of the Baskervilles

 

To achieve this, we "could" hardwire our SPARQL query to look something like the following (where "MyList" is the URI of our collection entity):

SELECT (?lm AS ?_URI) (?ps AS ?_SORT_PRIMARY) (?ss AS ?_SORT_SECONDARY) (?gr AS ?Genre) (?pr AS ?Price) (?tt AS ?Title)
   WHERE {  
      GRAPH ?ng {MyList <list:hasMember> ?lm } .
      ?ng <ex:primarySort> ?ps .
      OPTIONAL { ?ng <ex:secondarySort> ?ss } .
      ?lm <ex:hasGenre> ?gr ;
          <ex:hasPrice> ?pr ;              
          <ex:hasTitle> ?tt .
};

But the above is not reusable and would require foreknowledge of the intent of the specific collection in order to render it. This puts the onus on the developer to know the insides and outsides of the content model, as well as having to respond to any and all content changes -- consequently making such a solution brittle in the process.

Instead, knowing that our collection has hints, we want to first "fetch" the object & edge property values as defined by the collection and then dynamically build the desired query from those hints. As such, a more algorithmic approach would look something more like the following set of three SPARQL queries (where "MyList" is the URI of our collection entity):

#fetch the list of edge properties for this collection
#also fetch the corresponding property labels
SELECT DISTINCT ?ep ?epL (COUNT(?ep) AS epCount)
   WHERE {
      MyList <ex:hasEdgeProperty> ?ep .
      ?ep <rdfs:label> ?epL .
};

#fetch the list of object properties for this collection 
#also fetch the corresponding property labels 
SELECT DISTINCT ?op ?opL (COUNT(?op) AS opCount)
   WHERE { 
      MyList <ex:hasObjProperty> ?op . 
      ?op <rdfs:label> ?opL .   
}; 

#build the query from the list of 1..n object & edge properties as above
#the following is meant to be treated as pseudocode
SELECT (?lm AS ?_URI) (?epv1 AS epL.1) (?epv2 AS epL.2) ... (?opv1 AS opL.1) (?opv2 AS opL.2) ... 
   WHERE { 
      GRAPH ?ng {MyList <list:hasMember> ?lm } . 
      { SELECT ?ng ?epv1 ?epv2 ?epv3 ...  
         WHERE {  
            #iterate over the list of edge properties
            OPTIONAL {?ng ep.1 ?epv1} .         
            OPTIONAL {?ng ep.2 ?epv2} .  
            ...        
            OPTIONAL {?ng ep.N ?epvN} .         
      }}   
      { SELECT ?lm ?lmL ?opv1 ?opv2 ?opv3 ...     
         WHERE { 
            #iterate over the list of object properties
            OPTIONAL {?lm op.1 ?opv1} .         
            OPTIONAL {?lm op.2 ?opv2} .         
            ... 
            OPTIONAL {?lm op.N ?opvN} .         
      }} 
};

Where the red highlights shown above represent the 1..n property entities returned by the initial "fetch" of the object & edge properties. Needless to say, the developer will use those variables to build  and execute a properly formatted SPARQL query to return the whole table. The pattern shown is generic and reusable. And best of all it is not excessively "chatty". As such, it will work with any type of collection using this pattern, whether tall or wide!

Note: With an Oracle database, we can create even more efficient SPARQL queries when rendering wide tables by eliminating all optionals and using a SQL PIVOT operation. This technique is beyond the scope of this blog however.

FOR EXTRA CREDIT:

  • How would you extend the above to enable multi-language attributes?

RELATED:

Michael J. Sullivan

Principal Cloud Solutions Architect


Previous Post

How to Implement an OCI API Gateway Authorization Fn in Node.js that Accesses OCI Resources

Muhammad Abdel-Halim | 13 min read

Next Post


A Simple Guide to Setup API Gateway with Oracle Integration Cloud

Shub Lahiri | 6 min read