4 SQL-Based Property Graph Query and Analytics
You can use SQL to query property graph data in Oracle Spatial and Graph.
For the property graph support in Oracle Spatial and Graph, all the vertices and edges data are persisted in relational form in Oracle Database. For detailed information about the Oracle Spatial and Graph property graph schema objects, see Property Graph Schema Objects for Oracle Database.
This chapter provides examples of typical graph queries implemented using SQL. The audience includes DBAs as well as application developers who understand SQL syntax and property graph schema objects.
The benefits of querying directly property graph using SQL include:
-
There is no need to bring data outside Oracle Database.
-
You can leverage the industry-proven SQL engine provided by Oracle Database.
-
You can easily join or integrate property graph data with other data types (relational, JSON, XML, and so on).
-
You can take advantage of existing Oracle SQL tuning and database management tools and user interface.
The examples assume that there is a property graph named connections
in the current schema. The SQL queries and example output are for illustration purpose only, and your output may be different depending on the data in your connections
graph. In some examples, the output is reformatted for readability.
Topics:
- Simple Property Graph Queries
The examples in this topic query vertices, edges, and properties of the graph. - Text Queries on Property Graphs
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column. - Navigation and Graph Pattern Matching
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends. - Navigation Options: CONNECT BY and Parallel Recursion
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying. - Pivot
The PIVOT clause lets you dynamically add columns to a table to create a new table. - SQL-Based Property Graph Analytics
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.
4.1 Simple Property Graph Queries
The examples in this topic query vertices, edges, and properties of the graph.
Example 4-1 Find a Vertex with a Specified Vertex ID
This example find the vertex with vertex ID 1 in the connections
graph.
SQL> select vid, k, v, vn, vt
from connectionsVT$
where vid=1;
The output might be as follows:
1 country United States 1 name Barack Obama 1 occupation 44th president of United States of America ...
Example 4-2 Find an Edge with a Specified Edge ID
This example find the edge with edge ID 100 in the connections
graph.
SQL> select eid,svid,dvid,k,t,v,vn,vt
from connectionsGE$
where eid=1000;
The output might be as follows:
1000 1 2 weight 3 1 1
In the preceding output, the K of the edge property is "weight" and the type ID of the value is 3, indicating a float value.
Example 4-3 Perform Simple Counting
This example performs simple counting in the connections
graph.
SQL> -- Get the total number of K/V pairs of all the vertices
SQL> select /*+ parallel */ count(1)
from connectionsVT$;
299
SQL> -- Get the total number of K/V pairs of all the edges
SQL> select /*+ parallel(8) */ count(1)
from connectionsGE$;
164
SQL> -- Get the total number of vertices
SQL> select /*+ parallel */ count(distinct vid)
from connectionsVT$;
78
SQL> -- Get the total number of edges
SQL> select /*+ parallel */ count(distinct eid)
from connectionsGE$;
164
Example 4-4 Get the Set of Property Keys Used
This example gets the set of property keys used for the vertices n the connections
graph.
SQL> select /*+ parallel */ distinct k
from connectionsVT$;
company
show
occupation
type
team
religion
criminal charge
music genre
genre
name
role
political party
country
13 rows selected.
SQL> -- get the set of property keys used for edges
SQL> select /*+ parallel */ distinct k
from connectionsGE$;
weight
Example 4-5 Find Vertices with a Value
This example finds vertices with a value (of any property) that is of String type, and where and the value contains two adjacent occurrences of a, e, i, o, or u, regardless of case.n the connections
graph.
SQL> select vid, t, k, v
from connectionsVT$
where t=1
and regexp_like(v, '([aeiou])\1', 'i');
6 1 name Jordan Peele
6 1 show Key and Peele
54 1 name John Green
...
It is usually hard to leverage a B-Tree index for the preceding kind of query because it is difficult to know beforehand what kind of regular expression is going to be used. For the above query, you might get the following execution plan. Note that full table scan is chosen by the optimizer.
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 795 | 28 (0)| 00:00:01 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 15 | 795 | 28 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS FULL| CONNECTIONSVT$ | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("V") AND REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
- Degree of Parallelism is 2 because of table property
If the Oracle Database In-Memory option is available and memory is sufficient, it can help performance to place the table (full table or a set of relevant columns) in memory. One way to achieve that is as follows:
SQL> alter table connectionsVT$ inmemory;
Table altered.
Now, entering the same SQL containing the regular expression shows a plan that performs a "TABLE ACCESS INMEMORY FULL".
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 795 | 28 (0)| 00:00:01 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 15 | 795 | 28 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS INMEMORY FULL| CONNECTIONSVT$ | 15 | 795 | 28 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(INTERNAL_FUNCTION("V") AND REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
- Degree of Parallelism is 2 because of table property
4.2 Text Queries on Property Graphs
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.
Oracle Text can process text that is directly stored in the database. The text can be short strings (such as names or addresses), or it can be full-length documents. These documents can be in a variety of textual format.
The text can also be in many different languages. Oracle Text can handle any space-separated languages (including character sets such as Greek or Cyrillic). In addition, Oracle Text is able to handle the Chinese, Japanese and Korean pictographic languages)
Because the property graph feature uses NVARCHAR typed column for better support of Unicode, it is highly recommended that UTF8 (AL32UTF8) be used as the database character set.
To create an Oracle Text index on the vertices table (or edges table), the ALTER SESSION privilege is required. For example:
SQL> grant alter session to <YOUR_USER_SCHEMA_HERE>;
If customization is required, also grant the EXECUTE privilege on CTX_DDL:
SQL> grant execute on ctx_ddl to <YOUR_USER_SCHEMA_HERE>;
The following shows some example statements for granting these privileges to SCOTT.
SQL> conn / as sysdba
Connected.
SQL> -- This is a PDB setup --
SQL> alter session set container=orcl;
Session altered.
SQL> grant execute on ctx_ddl to scott;
Grant succeeded.
SQL> grant alter session to scott;
Grant succeeded.
Example 4-6 Create a Text Index
This example creates an Oracle Text index on the vertices table (V column) of the connections graph in the SCOTT schema. Note that unlike the text index capabilities provided by either Apache Lucene or Apache SolrCloud, the Oracle Text index created here is for all property keys, not just one or a subset of property keys. In addition, if a new property is added to the graph and the property value is of String data type, then it will automatically be included in the same text index.
The example uses the OPG_AUTO_LEXER lexer owned by MDSYS.
SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'MDSYS', lexer=>'OPG_AUTO_LEXER', dop=>2);
If customization is desired, you can use the ctx_ddl.create_preference API. For example:
SQL> -- The following requires access privilege to CTX_DDL
SQL> exec ctx_ddl.create_preference('SCOTT.OPG_AUTO_LEXER', 'AUTO_LEXER');
PL/SQL procedure successfully completed.
SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'scott', lexer=>'OPG_AUTO_LEXER', dop=>2);
PL/SQL procedure successfully completed.
You can now use a rich set of functions provided by Oracle Text to perform queries against graph elements.
Note:
If you no longer need an Oracle Text index, you can use the drop_vertices_text_idx or opg_apis.drop_edges_text_idx API to drop it. The following statements drop the text indexes on the vertices and edges of a graph named connections
owned by SCOTT:
SQL> exec opg_apis.drop_vertices_text_Idx('scott', 'connections');
SQL> exec opg_apis.drop_edges_text_Idx('scott', 'connections');
Example 4-7 Find a Vertex that Has a Property Value
The following example find a vertex that has a property value (of string type) containing the keyword "Obama".
SQL> select vid, k, t, v
from connectionsVT$
where t=1
and contains(v, 'Obama', 1) > 0
order by score(1) desc
;
The output and SQL execution plan from the preceding statement may appear as follows. Note that DOMAIN INDEX appears as an operation in the execution plan.
1 name 1 Barack Obama
Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 56 | 5 (20) | 00:00:01 | | |
| 1 | SORT ORDER BY | | 1 | 56 | 5 (20) | 00:00:01 | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID |
|* 3 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T"=1 AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
3 - access("CTXSYS"."CONTAINS"("V",'Obama',1)>0)
Example 4-8 Fuzzy Match
The following example finds a vertex that has a property value (of string type) containing variants of "ameriian" (a deliberate misspelling for this example) Fuzzy match is used.
SQL> select vid, k, t, v
from connectionsVT$
where contains(v, 'fuzzy(ameriian,,,weight)', 1) > 0
order by score(1) desc;
The output and SQL execution plan from the preceding statement may appear as follows.
8 role 1 american business man 9 role 1 american business man 4 role 1 american economist 6 role 1 american comedian actor 7 role 1 american comedian actor 1 occupation 1 44th president of United States of America 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1619508090 ----------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 5 (20)| 00:00:01 | | | | 1 | SORT ORDER BY | | 1 | 56 | 5 (20)| 00:00:01 | | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 3 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | | ----------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
Example 4-9 Query Relaxation
The following example is a sophisticated Oracle Text query that implements query relaxation, which enables you to execute the most restrictive version of a query first, progressively relaxing the query until the required number of matches is obtained. Using query relaxation with queries that contain multiple strings, you can provide guidance for determining the “best” matches, so that these appear earlier in the results than other potential matches.
This example searchs for "american actor" with a query relaxation sequence.
SQL> select vid, k, t, v
from connectionsVT$
where CONTAINS (v,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT">
<progression>
<seq>{american} {actor}</seq>
<seq>{american} NEAR {actor}</seq>
<seq>{american} AND {actor}</seq>
<seq>{american} ACCUM {actor}</seq>
</progression>
</textquery>
<score datatype="INTEGER" algorithm="COUNT"/>
</query>') > 0;
The output and SQL execution plan from the preceding statement may appear as follows.
7 role 1 american comedian actor 6 role 1 american comedian actor 44 occupation 1 actor 8 role 1 american business man 53 occupation 1 actor film producer 52 occupation 1 actor 4 role 1 american economist 47 occupation 1 actor 9 role 1 american business man 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2158361449 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 56 | 4 (0)| 00:00:01 | | | |* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$ | 1 | 56 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 2 | DOMAIN INDEX | CONNECTIONSXTV$ | | | 4 (0)| 00:00:01 | | | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V")) 2 - access("CTXSYS"."CONTAINS"("V",'<query> <textquery lang="ENGLISH" grammar="CONTEXT"> <progression> <seq>{american} {actor}</seq> <seq>{american} NEAR {actor}</seq> <seq>{american} AND {actor}</seq> <seq>{american} ACCUM {actor}</seq> </progression> </textquery> <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0)
Example 4-10 Find an Edge
Just as with vertices, you can create an Oracle Text index on the V column of the edges table (GE$) of a property graph. The following example uses the OPG_AUTO_LEXER lexer owned by MDSYS.
SQL> exec opg_apis.create_edges_text_idx('scott', 'connections', pref_owner=>'mdsys', lexer=>'OPG_AUTO_LEXER', dop=>4);
If customization is required, use the ctx_ddl.create_preference API.
4.3 Navigation and Graph Pattern Matching
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.
This topic provides graph navigation and pattern matching examples using the example property graph named connections. Most of the SQL statements are relatively simple, but they can be used as building blocks to implement requirements that are more sophisticated. It is generally best to start from something simple, and progressively add complexity.
Example 4-11 Who Are a Person's Collaborators?
The following SQL ststement finds all entities that a vertex with ID 1 collaborates with. For simplicity, it considers only outgoing relationships.
SQL> select dvid, el, k, vn, v
from connectionsGE$
where svid=1
and el='collaborates';
Note:
To find the specific vertex ID of interest, you can perform a text query on the property graph using keywords or fuzzy matching. (For details and examples, see Text Queries on Property Graphs.)
The preceding example's output and execution plan may be as follows.
2 collaborates weight 1 1 21 collaborates weight 1 1 22 collaborates weight 1 1 .... 26 collaborates weight 1 1 10 rows selected. ------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 460 | 2 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 10 | 460 | 2 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 3 | PX PARTITION HASH ALL | | 10 | 460 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 4 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 10 | 460 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 5 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates' AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V")) 5 - access("SVID"=1)
Example 4-12 Who Are a Person's Collaborators and What are Their Occupations?
The following SQL statement finds collaborators of the vertex with ID 1, and the occupation of each collaborator. A join with the vertices table (VT$) is required.
SQL> select dvid, vertices.v
from connectionsGE$, connectionsVT$ vertices
where svid=1
and el='collaborates'
and dvid=vertices.vid
and vertices.k='occupation';
The preceding example's output and execution plan may be as follows.
21 67th United States Secretary of State 22 68th United States Secretary of State 23 chancellor 28 7th president of Iran 19 junior United States Senator from New York ... -------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 525 | 7 (0)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 7 | 525 | 7 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 3 | NESTED LOOPS | | 7 | 525 | 7 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 4 | PX PARTITION HASH ALL | | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 6 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 7 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 9 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates') 6 - access("SVID"=1) 8 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 9 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'occupation') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Example 4-13 Find a Person's Enemies and Aggregate Them by Their Country
The following SQL statement finds enemies (that is, those with the feuds
relationship) of the vertex with ID 1, and aggregates them by their countries. A join with the vertices table (VT$) is required.
SQL> select vertices.v, count(1)
from connectionsGE$, connectionsVT$ vertices
where svid=1
and el='feuds'
and dvid=vertices.vid
and vertices.k='country'
group by vertices.v;
The example's output and execution plan may be as follows. In this case, the vertex with ID 1 has 3 enemies in the United States and 1 in Russia.
United States 3 Russia 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 5 | 375 | 5 (20)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | PCWP | | | 4 | PX RECEIVE | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,01 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,00 | P->P | HASH | | 6 | HASH GROUP BY | | 5 | 375 | 5 (20)| 00:00:01 | | | Q1,00 | PCWP | | | 7 | NESTED LOOPS | | 5 | 375 | 4 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 8 | PX PARTITION HASH ALL | | 5 | 125 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 9 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$ | 5 | 125 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 10 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 11 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 12 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 13 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 9 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'feuds') 10 - access("SVID"=1) 12 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 13 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country') filter(INTERNAL_FUNCTION("VERTICES"."K"))
Example 4-14 Find a Person's Collaborators, and aggregate and sort them
The following SQL statement finds the collaborators of the vertex with ID 1, aggregates them by their country, and sorts them in ascending order.
SQL> select vertices.v, count(1)
from connectionsGE$, connectionsVT$ vertices
where svid=1
and el='collaborates'
and dvid=vertices.vid
and vertices.k='country'
group by vertices.v
order by count(1) asc;
The example output and execution plan may be as follows. In this case, the vertex with ID 1 has the most collaborators in the United States.
Germany 1 Japan 1 Iran 1 United States 7 --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 750 | 9 (23)| 00:00:01 | | | | | | | 1 | PX COORDINATOR | | | | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10002 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | P->S | QC (ORDER) | | 3 | SORT ORDER BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,02 | PCWP | | | 5 | PX SEND RANGE | :TQ10001 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | P->P | RANGE | | 6 | HASH GROUP BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | PCWP | | | 7 | PX RECEIVE | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,01 | PCWP | | | 8 | PX SEND HASH | :TQ10000 | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,00 | P->P | HASH | | 9 | HASH GROUP BY | | 10 | 750 | 9 (23)| 00:00:01 | | | Q1,00 | PCWP | | | 10 | NESTED LOOPS | | 10 | 750 | 7 (0)| 00:00:01 | | | Q1,00 | PCWP | | | 11 | PX PARTITION HASH ALL | | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 12 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | CONNECTIONSGE$ | 10 | 250 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 13 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 20 | | 1 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | | 14 | PARTITION HASH ITERATOR | | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | |* 15 | TABLE ACCESS BY LOCAL INDEX ROWID | CONNECTIONSVT$ | | | | | KEY | KEY | Q1,00 | PCWP | | |* 16 | INDEX UNIQUE SCAN | CONNECTIONSXQV$ | 1 | | 0 (0)| 00:00:01 | KEY | KEY | Q1,00 | PCWP | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 12 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates') 13 - access("SVID"=1) 15 - filter(INTERNAL_FUNCTION("VERTICES"."V")) 16 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country') filter(INTERNAL_FUNCTION("VERTICES"."K"))
4.4 Navigation Options: CONNECT BY and Parallel Recursion
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.
-
CONNECT BY lets you navigate and find matches in a hierarchical order. To follow outgoing edges, you can use prior dvid = svid to guide the navigation.
-
Parallel recursion lets you perform navigation up to a specified number of hops away.
The examples use a property graph named connections.
Example 4-15 CONNECT WITH
The following SQL statement follows the outgoing edges by 1 hop.
SQL> select G.dvid
from connectionsGE$ G
start with svid = 1
connect by nocycle prior dvid = svid and level <= 1;
The preceding example's output and execution plan may be as follows.
2 3 4 5 6 7 8 9 10 ... ------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 7 | 273 | 3 (67)| 00:00:01 | | | | | | |* 1 | CONNECT BY WITH FILTERING| | | | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 2 | 12 | 0 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) | | 4 | PX PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | | |* 5 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | | |* 6 | FILTER | | | | | | | | | | | | 7 | NESTED LOOPS | | 5 | 95 | 1 (0)| 00:00:01 | | | | | | | 8 | CONNECT BY PUMP | | | | | | | | | | | | 9 | PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | | | | |* 10 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | | | | ------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("SVID"=PRIOR "DVID") filter(LEVEL<=2) 5 - access("SVID"=1) 6 - filter(LEVEL<=2) 10 - access("connect$_by$_pump$_002"."prior dvid "="SVID")
To extend from 1 hop to multiple hops, change 1 in the preceding example to another integer. For example, to change it to 2 hops, specify: level <= 2
Example 4-16 Parallel Recursion
The following SQL statement uses recursion within the WITH clause to perform navigation up to 4 hops away, a using recursively defined graph expansion: g_exp
references g_exp
in the query, and that defines the recursion. The example also uses the PARALLEL optimizer hint for parallel execution.
SQL> WITH g_exp(svid, dvid, depth) as
(
select svid as svid, dvid as dvid, 0 as depth
from connectionsGE$
where svid=1
union all
select g2.svid, g1.dvid, g2.depth + 1
from g_exp g2, connectionsGE$ g1
where g2.dvid=g1.svid
and g2.depth <= 3
)
select /*+ parallel(4) */ dvid, depth
from g_exp
where svid=1
;
The example's output and execution plan may be as follows. Note that CURSOR DURATION MEMORY
is chosen in the execution, which indicates the graph expansion stores the intermediate data in memory.
22 4
25 4
24 4
1 4
23 4
33 4
22 4
22 4
... ...
Execution Plan
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 801 | 31239 | 147 (0)| 00:00:01 | | | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | | | |
| 3 | UNION ALL (RECURSIVE WITH) BREADTH FIRST | | | | | | | | | | |
| 4 | PX COORDINATOR | | | | | | | | | | |
| 5 | PX SEND QC (RANDOM) | :TQ20000 | 2 | 12 | 0 (0)| 00:00:01 | | | Q2,00 | P->S | QC (RAND) |
| 6 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q2,00 | PCWP | |
| 7 | PX PARTITION HASH ALL | | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q2,00 | PCWC | |
|* 8 | INDEX RANGE SCAN | CONNECTIONSXSE$ | 2 | 12 | 0 (0)| 00:00:01 | 1 | 8 | Q2,00 | PCWP | |
| 9 | PX COORDINATOR | | | | | | | | | | |
| 10 | PX SEND QC (RANDOM) | :TQ10000 | 799 | 12M| 12 (0)| 00:00:01 | | | Q1,00 | P->S | QC (RAND) |
| 11 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q1,00 | PCWP | |
|* 12 | HASH JOIN | | 799 | 12M| 12 (0)| 00:00:01 | | | Q1,00 | PCWP | |
| 13 | BUFFER SORT (REUSE) | | | | | | | | Q1,00 | PCWP | |
| 14 | PARTITION HASH ALL | | 164 | 984 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWC | |
| 15 | INDEX FAST FULL SCAN | CONNECTIONSXDE$ | 164 | 984 | 2 (0)| 00:00:01 | 1 | 8 | Q1,00 | PCWP | |
| 16 | PX BLOCK ITERATOR | | | | | | | | Q1,00 | PCWC | |
|* 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6614_11CB2D2 | | | | | | | Q1,00 | PCWP | |
| 18 | PX COORDINATOR | | | | | | | | | | |
| 19 | PX SEND QC (RANDOM) | :TQ30000 | 801 | 31239 | 135 (0)| 00:00:01 | | | Q3,00 | P->S | QC (RAND) |
|* 20 | VIEW | | 801 | 31239 | 135 (0)| 00:00:01 | | | Q3,00 | PCWP | |
| 21 | PX BLOCK ITERATOR | | 801 | 12M| 135 (0)| 00:00:01 | | | Q3,00 | PCWC | |
| 22 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6614_11CB2D2 | 801 | 12M| 135 (0)| 00:00:01 | | | Q3,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("SVID"=1)
12 - access("G2"."DVID"="G1"."SVID")
17 - filter("G2"."INTERNAL_ITERS$"=LEVEL AND "G2"."DEPTH"<=3)
20 - filter("SVID"=1)
4.5 Pivot
The PIVOT clause lets you dynamically add columns to a table to create a new table.
The schema design (VT$ and GE$) of the property graph is narrow ("skinny") rather than wide ("fat"). This means that if a vertex or edge has multiple properties, those property keys, values, data types, and so on will be stored using multiple rows instead of multiple columns. Such a design is very flexible in the sense that you can add properties dynamically without having to worry about adding too many columns or even reaching the physical maximum limit of number of columns a table may have. However, for some applications you may prefer to have a wide table if the properties are somewhat homogeneous.
Example 4-17 Pivot
The following CREATE TABLE ... AS SELECT statement uses PIVOT to add four columns: ‘company’,’ occupation’,’ name’, and ‘religion’.
SQL> CREATE TABLE table pg_wide
as
with G AS (select vid, k, t, v
from connectionsVT$
)
select *
from G
pivot (
min(v) for k in ('company', 'occupation', 'name', 'religion')
);
Table created.
The following DESCRIBE statement shows the definition of the new table, including the four added columns. (The output is reformatted for readability.)
SQL> DESCRIBE pg_wide;
Name Null? Type
--------------------------------------------------- -------- --------------------
VID NOT NULL NUMBER
T NUMBER(38)
'company' NVARCHAR2(15000)
'occupation' NVARCHAR2(15000)
'name' NVARCHAR2(15000)
'religion' NVARCHAR2(15000)
4.6 SQL-Based Property Graph Analytics
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.
SQL-based analytics are especially useful if:
-
The computation is not bound by the physical memory.
-
There is no need to move the graph data outside the database.
-
Computation is easily done against the latest snapshot of a property graph.
-
There is no need to 'sync up' (synchronize) data inside and outside the database.
However, when a graph (or a subgraph) fits in memory, then running analytics provided by the in-memory analyst usually provides better better performance than using SQL-based analytics. Because many of the analytics implementation require using intermediate data structures, most SQL- (PL/SQL-) based analytics APIs have parameters for working tables (wt). A typical flow has the following steps:
-
Prepare the working table or tables.
-
Perform analytics (one or multiple calls).
-
Perform cleanup
Example 4-18 Shortest Path
Consider shortest path, for example. Internally, Oracle Database uses the bidirectional Dijkstra algorithm. The following code snippet shows an end-to-end flow.
set serveroutput on
DECLARE
wt1 varchar2(100); -- intermediate working tables
n number;
path varchar2(1000);
weights varchar2(1000);
BEGIN
-- prepare
opg_apis.find_sp_prep('connectionsGE$', wt1);
dbms_output.put_line('working table name ' || wt1);
-- compute
opg_apis.find_sp(
'connectionsGE$',
1, -- start vertex ID
53, -- destination vertex ID
wt1, -- working table (for Dijkstra expansion)
dop => 1, -- degree of parallelism
stats_freq=>1000, -- frequency to collect statistics
path_output => path, -- shortest path (a sequence of vertices)
weights_output => weights, -- edge weights
options => null
);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- cleanup (commented out here; see text after the example)
-- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/
This example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.
working table name "CONNECTIONSGE$$TWFS12" path 1 3 52 53 weights 4 3 1 1 1 PL/SQL procedure successfully completed.
If you want to know the definition of the working table or tables, then skip the cleanup phase (as shown in the preceding example that comments out the call to find_sp_cleanup
). After the computation is done, you can describe the working table or tables.
SQL> describe "CONNECTIONSGE$$TWFS12"
Name Null? Type
--------- -------- ----------------------------
NID NUMBER
D2S NUMBER
P2S NUMBER
D2T NUMBER
P2T NUMBER
F NUMBER(38)
B NUMBER(38)
For advanced users who want to try different table creation options, such as using inmemory or advanced compression, you can pre-create the preceding working table and pass the name in.
Example 4-19 Create Working Table and Perform Analytics
The following statements first create a working table with the same column structure and basic compression enabled, then pass it to the computation.
create table connections$MY_EXP(
NID NUMBER,
D2S NUMBER,
P2S NUMBER,
D2T NUMBER,
P2T NUMBER,
F NUMBER(38),
B NUMBER(38)
) compress nologging;
DECLARE
wt1 varchar2(100) := 'connections$MY_EXP';
n number;
path varchar2(1000);
weights varchar2(1000);
BEGIN
dbms_output.put_line('working table name ' || wt1);
-- compute
opg_apis.find_sp(
'connectionsGE$',
1,
53,
wt1,
dop => 1,
stats_freq=>1000,
path_output => path,
weights_output => weights,
options => null
);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- cleanup
-- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/
At the end of the computation, if the working table has not been dropped or truncated, you can check the content of the working table, as follows. Note that the working table structure may vary between releases.
SQL> select * from connections$MY_EXP;
NID D2S P2S D2T P2T F B
---------- ---------- ---------- ---------- ---------- ---------- ----------
1 0 1.000E+100 1 -1
53 1.000E+100 0 -1 1
54 1.000E+100 1 53 -1 1
52 1.000E+100 1 53 -1 1
5 1 1 1.000E+100 0 -1
26 1 1 1.000E+100 0 -1
8 1000 1 1.000E+100 0 -1
3 1 1 2 52 0 0
15 1 1 1.000E+100 0 -1
21 1 1 1.000E+100 0 -1
19 1 1 1.000E+100 0 -1
...
Example 4-20 Multiple Calls to Same Graph
To perform multiple calls to the same graph, only a single call to the preparation step is needed. The following shows an example of computing shortest path for multiple pairs of vertices.
DECLARE
wt1 varchar2(100); -- intermediate working tables
n number;
path varchar2(1000);
weights varchar2(1000);
BEGIN
-- prepare
opg_apis.find_sp_prep('connectionsGE$', wt1);
dbms_output.put_line('working table name ' || wt1);
-- find shortest path from vertex 1 to vertex 53
opg_apis.find_sp( 'connectionsGE$', 1, 53,
wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- find shortest path from vertex 2 to vertex 36
opg_apis.find_sp( 'connectionsGE$', 2, 36,
wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- find shortest path from vertex 30 to vertex 4
opg_apis.find_sp( 'connectionsGE$', 30, 4,
wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
dbms_output.put_line('path ' || path);
dbms_output.put_line('weights ' || weights);
-- cleanup
opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/
The example's output may be as follows: three shortest paths have been found for the multiple pairs of vertices provided.
working table name "CONNECTIONSGE$$TWFS12" path 1 3 52 53 weights 4 3 1 1 1 path 2 36 weights 2 1 1 path 30 21 1 4 weights 4 3 1 1 1 PL/SQL procedure successfully completed.