Oracle has had a long history with graph technologies, with support of RDF Triplestores going back more than ten years. During that period property graphs were introduced to big data solutions and more recently to Oracle RDBMS as well. All graph technology at Oracle are available via the Spatial & Graph product line, which extends Oracle RDBMS and Big Data. While there is a lot of competition out there in the graph space, nobody has a more scalable, more performant offering than Oracle. And the good news is that there is a thriving Graph community here at Oracle (see links at the end of this article).
The value of graph data technology is well established, with links discussing that value provided later in this blog. However, it is fair to say that many architects and developers are only vaguely aware of graphs in general and are often too busy to consider adding an unknown to their projects — even when adding graph features to their project would significantly reduce risk, provide new functionality, and increase agility when rolling out their solution. This blog aims to introduce graph data technology at Oracle to architects, developers, and other thought leaders. Later blogs will dive deeper into the details.
Standing room only at one of the recent Graph breakout sessions @ 2018 Analytics and Data Summit
Graph databases provide a distinct advantage over other types of data stores for managing data in use cases such as:
In particular, the more “connected” your data is, the better is the fit of using a graph to model it.
Why should you care about inter-connectedness? Because Graphs are literally everywhere! e.g. road networks, power grids, biological networks, manufacturing, service, legal, regulatory, social networks (e.g. Facebook, Linkedin, Twitter, Baidu, Google+), and myriad types of domain-specific taxonomies/ontologies. Graphs are intuitive and extremely flexible compared to other types of data stores. They are easy to navigate, easy to form/discover/infer a path within the data (without needed joins), and natural to visualize. Additionally graphs do not require a predefined schema, making them uniquely agile.
Perhaps more importantly though, implementing a Graph requires one to become intimately knowledgable of the kinds of data and relationships involved in a project, resulting in a form of insightful data documentation that often eludes technical endeavors. Having data first in a graph form also facilitates identifying the proper ML/AI methodologies to use later on.
Oracle’s strategy is to enable Spatial & Graph use cases on every platform. Not only does it support both Property as well as RDF-style graphs, but the platform itself supports big data, RDBMS, and cloud.
a.k.a. Labeled Graph.
A property graph consists of a set of objects or vertices, and a set of arrows or edges connecting the objects. Vertices and edges can have multiple properties, which are represented as key-value pairs.
Each VERTEX has a unique identifier and can have:
Each EDGE has a unique identifier and can have:
Property graphs are ideal for calculating shortest path, centrality, ranking, and recommendations. There are two flavors of Property Graph here at Oracle: S&G backed by Oracle RDBMS and BDS&G backed by Hadoop and/or NoSQL.
Part of the Property Graph offering, Oracle’s PGX is a toolkit for graph analysis — both for running algorithms such as PageRank against graphs and for performing SQL-like pattern-matching against graphs, using the results of algorithmic analysis. PGX provides over 40 built-in analytic functions and the ability to design and compile your own custom algorithms via Green-Marl. In addition, there are features for filtering, sorting, simplifying, and extracting subgraphs. Such mutated graphs can be saved for later use.
RDF = Resource Description Framework.
An RDF Triplestore is a framework for modeling any type of data, for describing that data through one or more vocabularies, as well as for interoperating that data via shared ontologies and schema. All data is stored as “triples” (a.k.a. “tuples”) as shown as follows:
SUBJECT -> PREDICATE -> OBJECT.
Subject and predicate values are always URIs. Objects can be either URIs or labels. Not surprisingly, an RDF Triplestore is much more atomic than a property graph. Predicates can be formal and globally shareable — or you can design your own to be local only to your application. Examples of such global ontology namespaces include SKOS, DC, DCTERMS, OWL, RDF, RDFS, SWC, FOAF, and others. (see: https://www.w3.org/2006/07/SWD/Vocab/principles for a description of those namespaces)
RDF Triplestores are ideal for inferring data (e.g. the father of my father is my grandfather), data interoperability, and projects that combining graph processing with set processing at the same time.
RDF is only available for Oracle RDBMS.
For further reading on comparing the features and usage of RDF Triple Stores versus Property Graphs the following links are provided for your reading pleasure:
RDF Triplestore | Property Graph | |
Supports Oracle RDBMS | Yes | Yes |
Supports NoSQL, Hadoop, etc. | No | Yes (requires Big Data S&G) |
Quary Language | SPARQL, OWL inferencing | PGQL & PGX (for in-memory analytics) |
W3C standard | Yes | No |
APIs | Java, REST, SQL | REST, SQL, Oracle Spatial and Graph Property Graph Java APIs, TinkerPop Blueprints Java APIs, Oracle Database Property Graph Java APIs |
Primary Use Case | Linked Data, Semantic Search, Inferencing | Social Network Analysis |
Graph Model | Data Federation, Knowledge Representation, Semantic Web | Path Analytics, Social Network Analysis, Entity Analytics |
Primary Industry Domain | Life Sciences, Health Care, Publishing, Finance, Networks & Communications, Defense & Intelligence | Financial, Retail Marketing, Social Media, Smart Manufacturing |
Analytics Integrations | OBIEE, Oracle R Enterprise, Oracle Data Mining | Built-in parallel in-memory analytics engine, Apache Spark, Groovy shell. |
ETL tools | Direct Mapping, RDB2RDF (RDB to RDF Mapping Language), Apache Jena app development | .opv & .ope files, JDBC-Based Data Loading, External Table-Based Data Loading, SQL*Loader-Based Data |
SQL language extension | SEM_MATCH | |
Graph Visualization | Cytoscape, Tom Sawyer Perspectives | Cytoscape, Tom Sawyer Perspectives |
Data Formats | GraphML, GraphSON, GML, Oracle Flat File Format (.opv & .ope files) | |
Overall Strengths | Edge-centric. Formal theoretical foundation. Precise. Lots of standards. | Property-centric. Easy to learn. Suitable for social network analysis. |
Overall Weaknesses | Steep learning curve. Hidden complexities. | Lack of a standard query language. Hard to deal with multiple property graphs. |
Operational advantage | Inference, data interoperability, combined graph processing and set processing at the same time | Calculating shortest path, centrality, ranking, recommendations |
One thing that stands out immediately is that RDF is more atomic and verbose (it is also much more formal, enforcing class/subclass on the model). On the other hand, querying RDF via SPARQL is straightforward. In contrast, Oracle’s PGQL approach is to make the query language approachable to anyone familiar with SQL.
RDF Graph Model | Property Graph Model |
|
|
SPARQL. W3C standard language | PGQL. Not standard, but much closer to SQL (as compared to Cypher or Gremlin) |
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?knows WHERE { ?x foaf:name ?name . ?x foaf:knows ?knows } |
SELECT y.name, e, p FROM snGraph IN {SELECT…FROM…WHERE} WHERE (x WITH name = ‘Paul’) -[e:knows]-> (y), (z WITH name = ‘Amber’) -/p:knows*/-> (y), x.age > y.age GROUP BY ORDER BY LIMIT OFFSET
|
Yes! The diagram below shows how leveraging RDF semantics and 3rd party, domain-specific ontologies can enrich your Property Graph solution. This is done via RDF Views which the Property Graph consumes. Note that mapping multi-values can be a challenge. In such a design, the primary graph would be the Property Graph with the RDF acting as an enrichment support layer.
Note also that one could alternatively leverage a Property Graph View in an RDF Triple Store if appropriate.
Unless your application has a specific requirement for data interoperability, set calculations, and/or semantics, most application developers and clients will likely find that the OS&G Property Graph provides for a richer environment to get their heads around. In particular the ability to leverage Oracle’s parallel, in-memory Graph Analytics is a huge win for most any application. As such, what might be considered best practices when using a property graph?
Perhaps the number one best practice would be to consider using the graph as a single source of truth for all (federated) metadata across all data silos in the enterprise. Examples of such metadata might include:
Given the above, that would leave the RDBMS or Big Data data store(s) to be responsible for mostly unstructured data such as HTML fragments, images, videos, binaries, XML/JSON, etc. as well as highly mutable data such as custom prices and quantities. Please note that ML/AI can be applied to this “external-to-the-graph” data to extract additional implicit metadata such as semantic classifier, image tagger, sentiment analysis, etc. This derived metadata can then be iteratively fed back into the graph to continuously enrich it, increasingly benefitting both business users and consumers.
We thus move from this kind of rigid (but familiar) data architecture:
…to a more agile, more pluggable, and semantically richer data architecture like this:
Fronting the graph layers with a properly-architected data access layer will hide any differences between Graph and RDB access methods from your core application. Be aware that your DAL will likely need to orchestrate CRUD-ops to the backend as well, in particular kicking off ETL (if required). Note that Oracle supports graph views of relational data and for certain projects this can be ideal in that there is no need to copy the data or use ETL techniques. There is much to explore in later blogs about best practices here.
I intend on expanding upon graph patterns and best practices in up-coming A-Team blogs. Your feedback is welcomed. Graph technology is very, very cool and you should really consider adding it to your next project.
Data sheet | https://www.oracle.com/assets/spatial-and-graph-ds-1738135.pdf |
Developer’s Guide | https://docs.oracle.com/en/database/oracle/oracle-database/12.2/spatl/toc.htm |
Special Interest Group | http://www.ioug.org/p/cm/ld/fid=148&gid=439 |
Google Plus community | https://plus.google.com/communities/108078829007193480508 |
Spatial & Graph blog | https://blogs.oracle.com/oraclespatial/ |
Oracle Developer Community | https://community.oracle.com/community/database/oracle-database-options/spatial |
Otube (Graph Enthusiast Channel) | https://otube.oracle.com/channel/Graph%2BEnthusiast%2BCommunity/2435 |
PGX (Oracle Technology Network) | http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html |
PGX Developer’s Guide | https://docs.oracle.com/cd/E56133_01/2.4.0/reference/overview/index.html |
W3C RDF homepage | https://www.w3.org/RDF/ |
Sample ETL (blog) | https://blogs.oracle.com/bigdataspatialgraph/from-relational-tables-to-property-graph |
Previous Post