Intro to Graphs at Oracle

Oracle has had a long history with graph technologies, with support of RDF Triplestores going back more than ten years. During that period property graphs were introduced to big data solutions and more recently to Oracle RDBMS as well. All graph technology at Oracle are available via the Spatial & Graph product line, which extends Oracle RDBMS and Big Data. While there is a lot of competition out there in the graph space, nobody has a more scalable, more performant offering than Oracle. And the good news is that there is a thriving Graph community here at Oracle (see links at the end of this article).

The value of graph data technology is well established, with links discussing that value provided later in this blog. However, it is fair to say that many architects and developers are only vaguely aware of graphs in general and are often too busy to consider adding an unknown to their projects — even when adding graph features to their project would significantly reduce risk, provide new functionality, and increase agility when rolling out their solution. This blog aims to introduce graph data technology at Oracle to architects, developers, and other thought leaders. Later blogs will dive deeper into the details.

IMG_5515

Standing room only at one of the recent Graph breakout sessions @ 2018 Analytics and Data Summit

Why Use a Graph?

Graph databases provide a distinct advantage over other types of data stores for managing data in use cases such as:

  • Dealing with interlinked, hierarchical or highly connected datasets (e.g. social graphs, product catalogs, classifications and taxonomies, etc.)
  • Integration of heterogeneous data sources, where changes of the data model are required as new data sources are added, or as the application evolves. The “schema-late” approach of graph databases (and other NoSQL databases) is well suited for such cases and significantly reduces the overhead of data model refactoring/ETL. In particular a semantically-rich graph can easily take on the role of a federated “view” across multiple data silos within the enterprise.
  • Relationship-centric cases, where exploring the connections between the nodes of the graph supports data discovery and analytics. This is different from the “entity-centric” data modeling, where it is easy to obtain information about a particular entity, but exploring its relationships with other entities requires costly “joins”

In particular, the more “connected” your data is, the better is the fit of using a graph to model it.

modeling graph relationships.011

Why should you care about inter-connectedness? Because Graphs are literally everywhere! e.g. road networks, power grids, biological networks, manufacturing, service, legal, regulatory, social networks (e.g. Facebook, Linkedin, Twitter, Baidu, Google+), and myriad types of domain-specific taxonomies/ontologies. Graphs are intuitive and extremely flexible compared to other types of data stores. They are easy to navigate, easy to form/discover/infer a path within the data (without needed joins), and natural to visualize. Additionally graphs do not require a predefined schema, making them uniquely agile.

Perhaps more importantly though, implementing a Graph requires one to become intimately knowledgable of the kinds of data and relationships involved in a project, resulting in a form of insightful data documentation that often eludes technical endeavors. Having data first in a graph form also facilitates identifying the proper ML/AI methodologies to use later on.

Graph features supported by Oracle Spatial & Graph

Oracle’s strategy is to enable Spatial & Graph use cases on every platform. Not only does it support both Property as well as RDF-style graphs, but the platform itself supports big data, RDBMS, and cloud.

Property Graph

a.k.a. Labeled Graph. 

A property graph consists of a set of objects or vertices, and a set of arrows or edges connecting the objects. Vertices and edges can have multiple properties, which are represented as key-value pairs.

Each VERTEX has a unique identifier and can have:

  • One or more outgoing edges
  • One or more incoming edges
  • A set of key-value properties

Each EDGE has a unique identifier and can have:

  • A text label that describes the relationship between the two vertices
  • An outgoing vertex
  • An incoming vertex
  • A set of key-value properties

Property graphs are ideal for calculating shortest path, centrality, ranking, and recommendations. There are two flavors of Property Graph here at Oracle: S&G backed by Oracle RDBMS and BDS&G backed by Hadoop and/or NoSQL.

Parallel In-Memory Graph Analytics (PGX)

Part of the Property Graph offering, Oracle’s PGX is a toolkit for graph analysis — both for running algorithms such as PageRank against graphs and for performing SQL-like pattern-matching against graphs, using the results of algorithmic analysis. PGX provides over 40 built-in analytic functions and the ability to design and compile your own custom algorithms via Green-Marl. In addition, there are features for filtering, sorting, simplifying, and extracting subgraphs. Such mutated graphs can be saved for later use.

RDF Triplestore

RDF = Resource Description Framework. 

An RDF Triplestore is a framework for modeling any type of data, for describing that data through one or more vocabularies, as well as for interoperating that data via shared ontologies and schema. All data is stored as “triples” (a.k.a. “tuples”) as shown as follows:

SUBJECT -> PREDICATE -> OBJECT.

Subject and predicate values are always URIs. Objects can be either URIs or labels. Not surprisingly, an RDF Triplestore is much more atomic than a property graph. Predicates can be formal and globally shareable — or you can design your own to be local only to your application. Examples of such global ontology namespaces include SKOS, DC, DCTERMS, OWL, RDF, RDFS, SWC, FOAF, and others. (see: https://www.w3.org/2006/07/SWD/Vocab/principles for a description of those namespaces)

RDF Triplestores are ideal for inferring data (e.g. the father of my father is my grandfather), data interoperability, and projects that combining graph processing with set processing at the same time.

RDF is only available for Oracle RDBMS.

Further reading comparing Property Graphs vs. RDF Triplestores

For further reading on comparing the features and usage of RDF Triple Stores versus Property Graphs the following links are provided for your reading pleasure:

Comparing Spatial & Graph graph features

RDF Triplestore Property Graph
Supports Oracle RDBMS Yes Yes
Supports NoSQL, Hadoop, etc. No Yes (requires Big Data S&G)
Quary Language SPARQL, OWL inferencing PGQL & PGX (for in-memory analytics)
W3C standard Yes No
APIs Java, REST, SQL REST, SQL, Oracle Spatial and Graph Property Graph Java APIs, TinkerPop Blueprints Java APIs, Oracle Database Property Graph Java APIs
Primary Use Case Linked Data, Semantic Search, Inferencing Social Network Analysis
Graph Model Data Federation, Knowledge Representation, Semantic Web Path Analytics, Social Network Analysis, Entity Analytics
Primary Industry Domain Life Sciences, Health Care, Publishing, Finance, Networks & Communications, Defense & Intelligence Financial, Retail Marketing, Social Media, Smart Manufacturing
Analytics Integrations OBIEE, Oracle R Enterprise, Oracle Data Mining Built-in parallel in-memory analytics engine, Apache Spark, Groovy shell.
ETL tools Direct Mapping, RDB2RDF (RDB to RDF Mapping Language), Apache Jena app development .opv & .ope files, JDBC-Based Data Loading, External Table-Based Data Loading, SQL*Loader-Based Data
SQL language extension SEM_MATCH
Graph Visualization Cytoscape, Tom Sawyer Perspectives Cytoscape, Tom Sawyer Perspectives
Data Formats GraphML, GraphSON, GML, Oracle Flat File Format (.opv & .ope files)
Overall Strengths Edge-centric. Formal theoretical foundation. Precise. Lots of standards. Property-centric. Easy to learn. Suitable for social network analysis.
Overall Weaknesses Steep learning curve. Hidden complexities. Lack of a standard query language. Hard to deal with multiple property graphs.
Operational advantage Inference, data interoperability, combined graph processing and set processing at the same time Calculating shortest path, centrality, ranking, recommendations

A Graph Modeling Comparison:

One thing that stands out immediately is that RDF is more atomic and verbose (it is also much more formal, enforcing class/subclass on the model). On the other hand, querying RDF via SPARQL is straightforward. In contrast, Oracle’s PGQL approach is to make the query language approachable to anyone familiar with SQL.

RDF Graph Model Property Graph Model
 Graph query 2  

Graph query 1

SPARQL. W3C standard language PGQL. Not standard, but much closer to SQL (as compared to Cypher or Gremlin)
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?knows
WHERE
   { ?x foaf:name ?name .
     ?x foaf:knows ?knows }
SELECT y.name, e, p
FROM snGraph IN {SELECT…FROM…WHERE}
WHERE
  (x WITH name = ‘Paul’) -[e:knows]-&gt; (y),
  (z WITH name = ‘Amber’) -/p:knows*/-&gt; (y),
  x.age &gt; y.age

GROUP BY
ORDER BY
LIMIT
OFFSET


   

Can I use both Graph types in the same application?

Yes! The diagram below shows how leveraging RDF semantics and 3rd party, domain-specific ontologies can enrich your Property Graph solution. This is done via RDF Views which the Property Graph consumes. Note that mapping multi-values can be a challenge. In such a design, the primary graph would be the Property Graph with the RDF acting as an enrichment support layer.

Note also that one could alternatively leverage a Property Graph View in an RDF Triple Store if appropriate.

modeling graph relationships.012

A typical pattern for using Oracle Spatial & Graph with your bespoke application

Unless your application has a specific requirement for data interoperability, set calculations, and/or semantics, most application developers and clients will likely find that the OS&G Property Graph provides for a richer environment to get their heads around. In particular the ability to leverage Oracle’s parallel, in-memory Graph Analytics is a huge win for most any application. As such, what might be considered best practices when using a property graph?

Perhaps the number one best practice would be to consider using the graph as a single source of truth for all (federated) metadata across all data silos in the enterprise. Examples of such metadata might include:

  • Users & Groups
  • Relationships between users, graph data, and graph operations
    • example: user[Michael] -> published[2018-03-26] -> article[123]
  • Relationships between the graph and external resources (e.g. an image or video)
  • And all other “class-like” metadata including (but not limited to):
    • Facets
    • Taxonomies
    • Ontologies
    • Tags
    • Categories
    • Segments
    • Personas
    • Languages/locales

Given the above, that would leave the RDBMS or Big Data data store(s) to be responsible for mostly unstructured data such as HTML fragments, images, videos, binaries, XML/JSON, etc. as well as highly mutable data such as custom prices and quantities. Please note that ML/AI can be applied to this “external-to-the-graph” data to extract additional implicit metadata such as semantic classifier, image tagger, sentiment analysis, etc. This derived metadata can then be iteratively fed back into the graph to continuously enrich it, increasingly benefitting both business users and consumers.

We thus move from this kind of rigid (but familiar) data architecture:

modeling graph relationships.010

…to a more agile, more pluggable, and semantically richer data architecture like this:

modeling graph relationships.009

Fronting the graph layers with a properly-architected data access layer will hide any differences between Graph and RDB access methods from your core application. Be aware that your DAL will likely need to orchestrate CRUD-ops to the backend as well, in particular kicking off ETL (if required). Note that Oracle supports graph views of relational data and for certain projects this can be ideal in that there is no need to copy the data or use ETL techniques. There is much to explore in later blogs about best practices here.

I intend on expanding upon graph patterns and best practices in up-coming A-Team blogs. Your feedback is welcomed. Graph technology is very, very cool and you should really consider adding it to your next project.

Learn more about Oracle Spatial & Graph:

Data sheet https://www.oracle.com/assets/spatial-and-graph-ds-1738135.pdf
Developer’s Guide https://docs.oracle.com/en/database/oracle/oracle-database/12.2/spatl/toc.htm
Special Interest Group http://www.ioug.org/p/cm/ld/fid=148&gid=439
Google Plus community https://plus.google.com/communities/108078829007193480508
Spatial & Graph blog https://blogs.oracle.com/oraclespatial/
Oracle Developer Community https://community.oracle.com/community/database/oracle-database-options/spatial
Otube (Graph Enthusiast Channel) https://otube.oracle.com/channel/Graph%2BEnthusiast%2BCommunity/2435
PGX (Oracle Technology Network) http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html
PGX Developer’s Guide https://docs.oracle.com/cd/E56133_01/2.4.0/reference/overview/index.html
W3C RDF homepage https://www.w3.org/RDF/
Sample ETL (blog) https://blogs.oracle.com/bigdataspatialgraph/from-relational-tables-to-property-graph

Add Your Comment