This is the second in a series of blogs describing how one would go about modeling a high-level ontology.
In my last blog, I introduced the idea of the CBox, the portion of the Knowledge Graph where taxonomists create and manage both formal and informal taxonomies, as well as controlled vocabulary, folksonomies, facets, etc. I also recommended adopting an upper-level ontology to assist in bootstrapping your ontology development. To this end, I will be using gist (see: https://www.semanticarts.com/gist/), an upper level ontology developed by Semantic Arts, to illustrate a series of micro-patterns you can adopt for your next project. As gist is open-source, you are free to use it but of course you can use any upper-level ontology you wish as these patterns are fairly universal. Let’s start with some very simple examples of some “buckets”.
Most all ontologies make use the concept of classes to provide the most basic kind of classification. Class inheritance is defined in RDF by the rdf:type object property. Example:
Which can be read as: Michael is a type of Oracle Employee. In fact, rdf:type is so common that there is even a traditional shorthand for it:
In this way, ex:OracleEmployee can be thought of as a bucket to which each employee entity adds itself by way of an rdf:type predicate. As you familiarize yourself with RDF you will see that class structures are extremely common. Perhaps a bit too common in fact. Many ontologies are incredibly complex, some consisting of 300k classes! (e.g. SNOMED). Such an ontology can be daunting to use as well as to manage. To wit: if everything is a class of a class then your ontology becomes inflexible, overly-complex, and brittle. Better is to only define formal classes where they are really needed to address specific use-cases and leave other types of flexible classification to do the heavy lifting — with the understanding that such entities can always be linked to formal class axioms in the future. However, the opposite is not true: complexity cannot easily be made simple.
From a diagrammatic point of view, a typical class bucket would look like the following:
Where ex:Smart_People is the class bucket (as managed by the taxonomist in the CBox) and the three person entities are rdf:types of ex:Smart_People. This of course also implies that the three person entities are now instances of the class themselves, which may or may not be what you want**. FWIW, this is perhaps the most common pattern used in the RDF industry. And to reiterate: it is overly-used.
THE ABOVE IN TURTLE FORMAT:
@prefix ex1: <https://data.example.com/> . @prefix ex: <https://taxa.example.com/> . @prefix gist: <https://semantic-arts.com/gist/> . gist:Person a owl:Class . ex:Smart_People a gist:Person ex1:SteveJobs a Smart_People . ex1:BillGates a Smart_People . ex1:ElonMusk a Smart_People .
There are many other types of buckets however. Next up is another super-simple example: tag buckets.
In contrast to class buckets, let’s take a look at what a more informal lightweight classification mechanism might look like using an upper-level ontology like gist. Here the taxonomist has created some “tag” entities named ex:SciFi and ex:Comedy, both of type gist:Tag. However, instead of using rdf:type for classification the movie entities use the gist:catagorizedBy predicate to indicate to which bucket they belong. From from a purely categorization point of view this does the same thing as rdf:type without the baggage of making the individual movie entities instances of the categories themselves. Also it allows multiple tag classifications (e.g. this movie is both a comedy and science fiction) without the problem of potential class conflict. Should in the future we were to wish to define certain movie categories as formal classes in their own right, we could always add an axiom to our graph as needed. This is what I mean by flexibility, but it needs to be considered at the beginning of the project.
THE ABOVE IN TURTLE FORMAT:
@prefix ex1: <https://data.example.com/> . @prefix ex: <https://taxa.example.com/> . @prefix gist: <https://semantic-arts.com/gist/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . gist:Tag rdfs:subClassOf gist:Category . ex:SciFi a gist:Tag . ex:Comedy a gist:Tag . ex:SciFi gist:hasTag “Science Fiction” . ex:Comedy gist:hasTag “Comedy” . ex1:YoungFrankenstein a ex:Movie . ex1:HitchHikersGuide a ex:Movie . ex1:Alien a ex:Movie . ex1:BladeRunner a ex:Movie . ex1:YoungFrankenstein gist:categorizedBy ex:Comedy . ex1:HitchHikersGuide gist:categorizedBy ex:Comedy . ex1:HitchHikersGuide gist:categorizedBy ex:SciFi . ex1:Alien gist:categorizedBy ex:SciFi . ex1:BladeRunner gist:categorizedBy ex:SciFi .
Note that with gist, the assumption is that each “tag” entity will have a gist:hasTag scalar attribute describing the tag itself. Such a declaration defines the “meaning” of what it means to be a tag in gist. Needless to say, if needed you could use SHACL to enforce this as a requirement.
** For example, Paris is logically not a type of France. Consequently, Paris being an instance of the France class makes no sense (but being part of a French Cities category may). Notwithstanding, RDF's flexibility will allow your taxonomists define it that way if they don't know what they are doing. This is where the use of Disjointness comes into play when designing your upper level ontology. Proper use of Disjointness can catch logical errors such as described herein.