Let's face it, coding for recursive RDF lists is less than enjoyable. I'm convinced that fact alone is one of the key reasons graph developers have overwhelming turned to property graphs as perhaps 80% of what developers do is work with lists, arrays, and hierarchies. Not surprisingly, they don't want to deal with something that they feel "sucks".
To that end I herein propose a flexible, efficient, generic pattern to model ordered collections using the RDF quad mechanism mentioned in my last few posts.
Note: an analogous pattern could also be implemented in either RDF* or RDF# as both extensions allow for edge properties. And while yes, reification could also be used, I'd rather stick needles in my eyes than suggest using reification! :P
Besides dealing with list ordering in a clean, flexible, and array-like manner (as opposed to recursively fetching prev and next), I also feel that collections should describe themselves so that they can be easily understood by a "controller" such as a GraphQL resolver. With that in mind, the idea is to introduce a sort of extensible triples "registry" which documents the edgeProperties as well as the objectProperties for any given collection. Where edgeProperties can be thought of as "structural" attributes of the collection (e.g. sort order, grouping, segmentation, sub levels, etc.) while objectProperties can be thought of as "content" attributes owned by the individual members of the collection (e.g. price, title, author, genre, etc.). Think of these as "hints" for the developer to use to generate the query as well as to know what will be returned. This allows the content-creator/taxonomist the flexibility to explicitly specify any property to render in the output payload on a per-collection basis. Additionally, collection members can be members of other collections as well with no restriction, each with their own unique set of properties to be output.
Note: regarding the above, it might make sense to employ SHACL constraints on the members to be sure they have the attributes being requested.
The basic collection RDF "schema" is shown both above as a graph as well as below in quad table format with notes:
Subject |
Predicate |
Object |
NamedGraph |
Notes |
ex:MyCollection |
rdf:type |
list:List |
|
MyCollection is a List |
ex:MyCollection |
list:hasMember |
ex:book1 |
ex:ng1 |
Core triple “edge” |
ex:MyCollection |
list:hasMember |
ex:book2 |
ex:ng2 |
Core triple “edge” |
ex:MyCollection |
list:hasMember |
ex:book3 |
ex:ng3 |
Core triple “edge” |
ex:MyCollection |
list:hasMember |
ex:book4 |
ex:ng4 |
Core triple “edge” |
ex:MyCollection |
list:edgeProperty |
ex:primarySort |
|
Optional edge-property query/rendering hint |
ex:MyCollection |
list:edgeProperty |
ex:secondarySort |
|
Optional edge-property query/rendering hint |
ex:MyCollection |
list:objectProperty |
ex:hasTitle |
|
Optional object-property query/rendering hint |
ex:MyCollection |
list:objectProperty |
ex:hasPrice |
|
Optional object-property query/rendering hint |
ex:MyCollection |
list:objectProperty |
ex:hasGenre |
|
Optional object-property query/rendering hint |
As you can see, we use the collection to gather four book entities into a list via a list:hasMember predicate.
Note: we can also implement list:isMemberOf to allow children nodes to choose their parent collection, then use an inverse axiom to entail the corresponding list:hasMember. This would allow the same SPARQL query to be used for both parent/child and child/parent types of collections!
For each member of the collection, we specify a NamedGraph (URI) which we will then use to attach edge-specific attributes as needed. In this particular example, we plan to specify two edge properties: ex:primarySort and ex:secondarySort which get populated as triples as shown below:
Subject |
Predicate |
Object |
NamedGraph |
Notes |
ex:ng1 |
ex:primarySort |
1 |
|
hasMember “edge” property |
ex:ng2 |
ex:primarySort |
2 |
|
hasMember “edge” property |
ex:ng3 |
ex:primarySort |
2 |
|
hasMember “edge” property |
ex:ng4 |
ex:primarySort |
3 |
|
hasMember “edge” property |
ex:ng2 |
ex:secondarySort |
1 |
|
hasMember “edge” property |
ex:ng3 |
ex:secondarySort |
2 |
|
hasMember “edge” property |
The final bit that makes for a nice-looking payload coming back from the SPARQL query is to make sure that our classes have human-readable labels defined for each. Something like the following will do:
Subject |
Predicate |
Object |
NamedGraph |
Notes |
ex:hasPrice |
rdfs:label |
Price |
|
Label to be rendered in matrix payload (required) |
ex:hasGenre |
rdfs:label |
Genre |
|
Label to be rendered in matrix payload (required) |
ex:hasTitle |
rdfs:label |
Title |
|
Label to be rendered in matrix payload (required) |
ex:primarySort |
rdfs:label |
_SORT_PRIMARY |
|
Label to be rendered in matrix payload (required) |
ex:secondarySort |
rdfs:label |
_SORT_SECONDARY |
|
Label to be rendered in matrix payload (required) |
Ok then, assuming each book has Title, Price, and Genre properties defined elsewhere, we have all the RDF data in place. The desired table payload we want coming back from the SPARQL query for our particular collection should look like the following:
_URI |
_SORT_PRIMARY |
_SORT_SECONDARY |
Genre |
Price |
Title |
ex:Book1 |
1 |
|
Metaphysics |
$25.00 |
Zen and the art of motorcycle maintenance |
ex:Book2 |
2 |
1 |
Science |
$15.00 |
The Fractal Geometry of Nature |
ex:Book3 |
2 |
2 |
Science |
$20.00 |
A brief history of time |
ex:Book4 |
3 |
|
Fiction |
$30.00 |
Hound of the Baskervilles |
To achieve this, we "could" hardwire our SPARQL query to look something like the following (where "MyList" is the URI of our collection entity):
SELECT (?lm AS ?_URI) (?ps AS ?_SORT_PRIMARY) (?ss AS ?_SORT_SECONDARY) (?gr AS ?Genre) (?pr AS ?Price) (?tt AS ?Title)
WHERE {
GRAPH ?ng {MyList <list:hasMember> ?lm } .
?ng <ex:primarySort> ?ps .
OPTIONAL { ?ng <ex:secondarySort> ?ss } .
?lm <ex:hasGenre> ?gr ;
<ex:hasPrice> ?pr ;
<ex:hasTitle> ?tt .
};
But the above is not reusable and would require foreknowledge of the intent of the specific collection in order to render it. This puts the onus on the developer to know the insides and outsides of the content model, as well as having to respond to any and all content changes -- consequently making such a solution brittle in the process.
Instead, knowing that our collection has hints, we want to first "fetch" the object & edge property values as defined by the collection and then dynamically build the desired query from those hints. As such, a more algorithmic approach would look something more like the following set of three SPARQL queries (where "MyList" is the URI of our collection entity):
#fetch the list of edge properties for this collection
#also fetch the corresponding property labels
SELECT DISTINCT ?ep ?epL (COUNT(?ep) AS epCount)
WHERE {
MyList <ex:hasEdgeProperty> ?ep .
?ep <rdfs:label> ?epL .
};
#fetch the list of object properties for this collection
#also fetch the corresponding property labels
SELECT DISTINCT ?op ?opL (COUNT(?op) AS opCount)
WHERE {
MyList <ex:hasObjProperty> ?op .
?op <rdfs:label> ?opL .
};
#build the query from the list of 1..n object & edge properties as above
#the following is meant to be treated as pseudocode
SELECT (?lm AS ?_URI) (?epv1 AS epL.1) (?epv2 AS epL.2) ... (?opv1 AS opL.1) (?opv2 AS opL.2) ...
WHERE {
GRAPH ?ng {MyList <list:hasMember> ?lm } .
{ SELECT ?ng ?epv1 ?epv2 ?epv3 ...
WHERE {
#iterate over the list of edge properties
OPTIONAL {?ng ep.1 ?epv1} .
OPTIONAL {?ng ep.2 ?epv2} .
...
OPTIONAL {?ng ep.N ?epvN} .
}}
{ SELECT ?lm ?lmL ?opv1 ?opv2 ?opv3 ...
WHERE {
#iterate over the list of object properties
OPTIONAL {?lm op.1 ?opv1} .
OPTIONAL {?lm op.2 ?opv2} .
...
OPTIONAL {?lm op.N ?opvN} .
}}
};
Where the red highlights shown above represent the 1..n property entities returned by the initial "fetch" of the object & edge properties. Needless to say, the developer will use those variables to build and execute a properly formatted SPARQL query to return the whole table. The pattern shown is generic and reusable. And best of all it is not excessively "chatty". As such, it will work with any type of collection using this pattern, whether tall or wide!
Note: With an Oracle database, we can create even more efficient SPARQL queries when rendering wide tables by eliminating all optionals and using a SQL PIVOT operation. This technique is beyond the scope of this blog however.
FOR EXTRA CREDIT:
RELATED:
Previous Post
Next Post