X

Best Practices from Oracle Development's A‑Team

Property Graph from Relational - Spatial and Graph part 3

 

Articles in this series:

https://www.ateam-oracle.com/rdf-view-from-relational-spatial-and-graph-part-1
https://www.ateam-oracle.com/rdf-model-from-rdf-view-spatial-and-graph-part-2
https://www.ateam-oracle.com/property-graph-from-relational-spatial-and-graph-part-3
https://www.ateam-oracle.com/visualize-property-graph-with-cytoscape-spatial-and-graph-part-4
https://www.ateam-oracle.com/oracle-20c-pgx-and-graphviz-spatial-and-graph-part-5

Introduction

This article will demonstrate converting relational data to property graph using SQL. We will take the Northwinds relational data we previously loaded, and create a property graph. We will only use a subset of the Northwinds tables for simplicity. We will use the PRODUCTS, CATEGORIES and SUPPLIERS tables.

Prerequisites

This document describes steps with Oracle Enterprise Edition 18.3.0.0.0 on Oracle Linux 7.7

If you are using any other version of the DB, the steps may be different. It is assumed you have already followed the prerequisites and steps in other articles in this series.

Property graphs

The NWUSER needs additional grants for property graphs. In sqlplus as sysdba or system, in the nwpdb:

grant create view to nwuser;
grant create table to nwuser;
grant create any index to nwuser;

Mapping tables

Vertex IDs should be integers. It is not mandatory, but it is easier to work with, and will perform better. To demonstrate how to create integer IDs, we will add a mapping table for the CUSTOMERS table as it contains string customer IDs. The mapping tables will give us a unique Integer ID to map to the customer ID. We will not be using the CUSTOMERS table to load our graph. This is just an example in case you have data of your own that needs integer ID's generated.

As the NWUSER:

create table nw_customer_idmap (    
    mycustid NUMBER GENERATED ALWAYS as IDENTITY(START with 10 INCREMENT by 10),
    customerid VARCHAR2(40),
    gender VARCHAR2(10)
);

Populate the nw_customer_idmap table we just created, and add a random gender. We are taking the string customerid from the CUSTOMER table, and inserting it in our map table.

insert into nw_customer_idmap (customerid, gender) select distinct customerid
    from CUSTOMERS;

This can be used with either a join, or a view to populate the graph tables. Views will be demonstrated below.

Create a property graph named NWPG

This step will create the property graph tables and indexes. As the NWUSER:

SQL> EXECUTE OPG_APIS.CREATE_PG('NWPG', 4, 8, '');

If you look at your list of tables in the NWUSER schema, you should now see NWPGVT$, NWPGGE$, etc...

Screenshot example:

Explanation of vertex and edge tables

Vertex table

Name                       Null?    Type
------------------------   -------- ----------------------------
 VID                       NOT NULL NUMBER
 K                                  NVARCHAR2(3100)
 T                                  NUMBER(38)
 V                                  NVARCHAR2(15000)
 VN                                 NUMBER
 VT                                 TIMESTAMP(6) WITH TIME ZONE
 SL                                 NUMBER
 VTS                                DATE
 VTE                                DATE
 FE                                 NVARCHAR2(4000)

VID is the vertex ID. This is our integer field we created in our customer mapping table.

K is the property key. Think of this as the name you want to give the vertex.

VID and K together, as a type of composite key, must be unique.

T is the type of property you put into K. Is it a String, Integer, etc... A list of values will be provided below.

V is the value of the property, K, if it is text. Note that numeric fields can be added as both text and numeric.

VN is the value of the property, K, if it is numeric.

VT is the value of the property, K, if it is a date/time field.

SL is the security label for use with Oracle Label.

VTS, VTE and FE are reserved columns.

Edge table

 Name                       Null?    Type
 -------------------------- -------- ----------------------------
 EID                        NOT NULL NUMBER
 SVID                       NOT NULL NUMBER
 DVID                       NOT NULL NUMBER
 EL                                  NVARCHAR2(3100)
 K                                   NVARCHAR2(3100)
 T                                   NUMBER(38)
 V                                   NVARCHAR2(15000)
 VN                                  NUMBER
 VT                                  TIMESTAMP(6) WITH TIME ZONE
 SL                                  NUMBER
 VTS                                 DATE
 VTE                                 DATE
 FE                                  NVARCHAR2(4000)

EID and K are composite valuse must be unique for every entry.

EID is the edge ID

SVID is the Source Vertex ID

DVID is the Destination Vertex ID

EL is the edge label

K is the Property key T is the type of property you put into K. Is it a String, Integer, etc... A list of values will be provided below.

V is the value of the property, K, if it is text. Note that numeric fields can be added as both text and numeric.

VN is the value of the property, K, if it is numeric.

VT is the value of the property, K, if it is a date/time field.

SL is the security label for use with Oracle Label.

VTS, VTE and FE are reserved columns.

Values for T

T tells the graph what kind of property you added to the vertex. Is it a String, Integer, Date etc... Values of T and their meaning are as follows: Data type Value

Data type               Value
---------               -----
STRING	                1
INTEGER	                2
FLOAT	                3
DOUBLE	                4
DATE	                5
LOCAL_DATE	            5
TIME	                5
TIMESTAMP	            5
TIME_WITH_TIMEZONE	    5
TIMESTAMP_WITH_TIMEZONE	5
BOOLEAN	                6
LONG	                7
POINT2D	                20

Populate the vertex data

The SQL being shown can be combined into a single execute using unions. It is broken out here to explicitly cover all the steps. It is also not necessary to create views first. This is being done to help visualize the process.

Execute SQL as the NWUSER.

Check unique ID's

Since we are adding vertex data from multiple tables, we need to be sure each VID is unique across all the tables. There is no constraint on the uniqueness of VID since you can have multiple properties per vertex.

SELECT 'category ID' as id, MIN(categoryid) as minID, MAX(categoryid) as maxID, (MAX(categoryid) - MIN(categoryid)) as range, COUNT(DISTINCT categoryid) as uniqueID, COUNT(*) as totalRows FROM categories
UNION ALL
SELECT 'product ID' as id, MIN(productid) as minID, MAX(productid) as maxID, (MAX(productid) - MIN(productid)) as range, COUNT(DISTINCT productid) as uniqueID, COUNT(*) as totalRows FROM products
UNION ALL
SELECT 'supplier ID' as id, MIN(supplierid) as minID, MAX(supplierid) as maxID, (MAX(supplierid) - MIN(supplierid)) as range, COUNT(DISTINCT supplierid) as uniqueID, COUNT(*) as totalRows FROM suppliers

Output from checking the ID's and their ranges across all our tables looks like this:

ID		 MINID	    MAXID      RANGE   UNIQUEID  TOTALROWS
----------- ---------- ---------- ---------- ---------- ----------
category ID	     1		8	   7	      8 	 8
product ID	     1	       77	  76	     77 	77
supplier ID	     1	       29	  28	     29 	29

There will be overlap between ID's across tables. For example, a category and a product can both have an ID of 7.

A unique ID can be created across all the tables in several ways. Depending on your data, you can concatenate columns, add a new column with sequences, use a sequence table, prefix all your id's with an integer value large enough for separation across tables, and more.

We will expose views with unique ID's for each table for this example.

Creating vertex views


create or replace view nw_category_view as
SELECT
    categories.categoryid,
    categories.categoryid + 100 as mycatid,
    categories.categoryname,
    categories.description
FROM
    categories;
    
create or replace view nw_product_view as
SELECT
    products.productid,
    products.productid + 200 as myprodid,
    products.productname,
    products.supplierid + 300 as supplierid,
    products.categoryid + 100 as categoryid,
    products.quantityperunit,
    products.unitprice,
    products.unitsinstock,
    products.unitsonorder,
    products.reorderlevel,
    products.discontinued
FROM
    products;  
    
create or replace view nw_supplier_view as
SELECT
    suppliers.supplierid,
    suppliers.supplierid + 300 as mysuppid,
    suppliers.companyname,
    suppliers.contactname,
    suppliers.contacttitle,
    suppliers.address,
    suppliers.city,
    suppliers.region,
    suppliers.postalcode,
    suppliers.country,
    suppliers.phone,
    suppliers.fax,
    suppliers.homepage
FROM
    suppliers;    

Run our unique ID check again using the views and the my*id we added.

SELECT 'category ID' as id, MIN(mycatid) as minID, MAX(mycatid) as maxID, (MAX(mycatid) - MIN(mycatid)) as range, COUNT(DISTINCT mycatid) as uniqueID, COUNT(*) as totalRows FROM nw_category_view
UNION ALL
SELECT 'product ID' as id, MIN(myprodid) as minID, MAX(myprodid) as maxID, (MAX(myprodid) - MIN(myprodid)) as range, COUNT(DISTINCT myprodid) as uniqueID, COUNT(*) as totalRows FROM nw_product_view
UNION ALL
SELECT 'supplier ID' as id, MIN(mysuppid) as minID, MAX(mysuppid) as maxID, (MAX(mysuppid) - MIN(mysuppid)) as range, COUNT(DISTINCT mysuppid) as uniqueID, COUNT(*) as totalRows FROM nw_supplier_view

Output looks like this:

ID		 MINID	    MAXID      RANGE   UNIQUEID  TOTALROWS
----------- ---------- ---------- ---------- ---------- ----------
category ID	   101	      108	   7	      8 	 8
product ID	   201	      277	  76	     77 	77
supplier ID	   301	      329	  28	     29 	29

First is adding categories. Note that we are adding the categoryid(mycatid) as both a string and integer. Also note the use of to_nchar. This is to ensure data is converted to the correct character set if required. If you have non ascii/double byte data. It doesn't apply in this specific statement, but will be used in all our inserts for consistency. The use of distinct is a safety net. You may encounter cases with other datasets where an id field is not unique for various reasons. If you attempt to enter the same VID/K key more than once, you will get a constraint violation. We add the categoryid, category name, and category description.

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT) 
select 
    distinct mycatid as VID,
    'mycatid' as K,
    2 as T,
    to_nchar(mycatid) AS V,
    mycatid as VN,
    NULL as VT
from NW_CATEGORY_VIEW;


INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT) 
select 
    distinct mycatid as VID,
    'categoryName' as K,
    1 as T,
    to_nchar(categoryname) AS V,
    NULL as VN,
    NULL as VT
from NW_CATEGORY_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT) 
select 
    distinct mycatid as VID,
    'categoryDescription' as K,
    1 as T,
    to_nchar(description) AS V,
    NULL as VN,
    NULL as VT
from NW_CATEGORY_VIEW;

Add the products. We are using a subset of the available columns in the source table for simplicity.

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct myprodid as VID,
    'myprodid' as K,
    2 as T,
    to_nchar(myprodid) AS V,
    myprodid as VN,
    NULL as VT
from NW_PRODUCT_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct myprodid as VID,
    'productName' as K,
    1 as T,
    to_nchar(productname) AS V,
    NULL as VN,
    NULL as VT
from NW_PRODUCT_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct myprodid as VID,
    'supplierId' as K,
    2 as T,
    to_nchar(supplierid) AS V,
    supplierid as VN,
    NULL as VT
from NW_PRODUCT_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct myprodid as VID,
    'categoryId' as K,
    2 as T,
    to_nchar(categoryid) AS V,
    categoryid as VN,
    NULL as VT
from NW_PRODUCT_VIEW;

Add supplier data. We are using a subset of the available columns in the source table for simplicity.

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct mysuppid as VID,
    'mysuppid' as K,
    2 as T,
    to_nchar(mysuppid) AS V,
    mysuppid as VN,
    NULL as VT
from NW_SUPPLIER_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct mysuppid as VID,
    'companyName' as K,
    1 as T,
    to_nchar(companyname) AS V,
    NULL as VN,
    NULL as VT
from NW_SUPPLIER_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct mysuppid as VID,
    'city' as K,
    1 as T,
    to_nchar(city) AS V,
    NULL as VN,
    NULL as VT
from NW_SUPPLIER_VIEW;

INSERT INTO NWPGVT$ (VID, K, T, V, VN, VT)
select 
    distinct mysuppid as VID,
    'country' as K,
    1 as T,
    to_nchar(country) AS V,
    NULL as VN,
    NULL as VT
from NW_SUPPLIER_VIEW;

Populate edge data

Create edge views

This step is not required, you could just populate the GE$ table directly with SQL. The view is being created to provide a visual aid for how the data is mapped.

The edge id (EID) must be unique across all the data. As with the vertex, there is not a constraint on this. There is however a unique constraint on the composite EID/K. We know our ID's from our vertex views are unique. We can use those to create a concatenated value that gives us a unique EID. For example, supplier 301 supplies product 201. We can combine these ID's to give us a unique EID of 301201.

create or replace view NW_PRODNAME_EDGE as
select supplierid || myprodid as eid,
supplierid as svid,
myprodid as dvid,
'supplies' as el,
'product' as k,
1 as t,
productname as v
from NW_PRODUCT_VIEW

create or replace view NW_CATNAME_EDGE as
SELECT
    nw_product_view.categoryid || nw_product_view.myprodid AS eid,
    nw_product_view.myprodid      AS svid,
    nw_product_view.categoryid    AS dvid,
    'belongsTo' AS el,
    'category' AS k,
    1 AS t,
    nw_category_view.categoryname   AS v
FROM
    nw_product_view
    INNER JOIN nw_category_view ON nw_category_view.mycatid = nw_product_view.categoryid

Populate edge data

As the NWUSER:

INSERT INTO NWPGGE$ (EID, SVID, DVID, EL, K, T, V, VN, VT)
SELECT
    nw_prodname_edge.eid,
    nw_prodname_edge.svid,
    nw_prodname_edge.dvid,
    nw_prodname_edge.el,
    nw_prodname_edge.k,
    nw_prodname_edge.t,
    to_nchar(nw_prodname_edge.v) as v,
    NULL AS vn,
    NULL AS vt
FROM
    nw_prodname_edge
    
INSERT INTO NWPGGE$ (EID, SVID, DVID, EL, K, T, V, VN, VT)
SELECT
    nw_catname_edge.eid,
    nw_catname_edge.svid,
    nw_catname_edge.dvid,
    nw_catname_edge.el,
    nw_catname_edge.k,
    nw_catname_edge.t,
    to_nchar(nw_catname_edge.v) as v,
    NULL AS vn,
    NULL AS vt
FROM
    nw_catname_edge    

Screenshot of graph visualization. We will go over visualizing in the next article.

 

References

https://docs.oracle.com/en/database/oracle/oracle-database/18/index.html

https://docs.oracle.com/en/database/oracle/oracle-database/18/spgdg/index.html

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha