SQL-Based Property Graph Query and Analytics

4 SQL-Based Property Graph Query and Analytics

You can use SQL to query property graph data in Oracle Spatial and Graph.

For the property graph support in Oracle Spatial and Graph, all the vertices and edges data are persisted in relational form in Oracle Database. For detailed information about the Oracle Spatial and Graph property graph schema objects, see Property Graph Schema Objects for Oracle Database.

This chapter provides examples of typical graph queries implemented using SQL. The audience includes DBAs as well as application developers who understand SQL syntax and property graph schema objects.

The benefits of querying directly property graph using SQL include:

There is no need to bring data outside Oracle Database.

You can leverage the industry-proven SQL engine provided by Oracle Database.

You can easily join or integrate property graph data with other data types (relational, JSON, XML, and so on).

You can take advantage of existing Oracle SQL tuning and database management tools and user interface.

The examples assume that there is a property graph named connections in the current schema. The SQL queries and example output are for illustration purpose only, and your output may be different depending on the data in your connections graph. In some examples, the output is reformatted for readability.

Topics:

Simple Property Graph Queries
The examples in this topic query vertices, edges, and properties of the graph.
Text Queries on Property Graphs
If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.
Navigation and Graph Pattern Matching
A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.
Navigation Options: CONNECT BY and Parallel Recursion
The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.
Pivot
The PIVOT clause lets you dynamically add columns to a table to create a new table.
SQL-Based Property Graph Analytics
In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.

4.1 Simple Property Graph Queries

The examples in this topic query vertices, edges, and properties of the graph.

Example 4-1 Find a Vertex with a Specified Vertex ID

This example find the vertex with vertex ID 1 in the connections graph.

SQL> select vid, k, v, vn, vt 
      from connectionsVT$ 
      where vid=1;

The output might be as follows:

     1 country     United States
     1 name        Barack Obama
     1 occupation  44th president of United States of America
     ...

Example 4-2 Find an Edge with a Specified Edge ID

This example find the edge with edge ID 100 in the connections graph.

SQL> select eid,svid,dvid,k,t,v,vn,vt 
      from connectionsGE$ 
      where eid=1000;

The output might be as follows:

    1000  1 2 weight  3  1  1

In the preceding output, the K of the edge property is "weight" and the type ID of the value is 3, indicating a float value.

Example 4-3 Perform Simple Counting

This example performs simple counting in the connections graph.

SQL> -- Get the total number of K/V pairs of all the vertices
SQL> select /*+ parallel */ count(1) 
       from connectionsVT$;

    299

SQL> -- Get the total number of K/V pairs of all the edges
SQL> select /*+ parallel(8) */ count(1) 
       from connectionsGE$;
    164

SQL> -- Get the total number of vertices
SQL> select /*+ parallel */ count(distinct vid) 
       from connectionsVT$;

    78

SQL> -- Get the total number of edges
SQL> select /*+ parallel */ count(distinct eid) 
       from connectionsGE$;

    164

Example 4-4 Get the Set of Property Keys Used

This example gets the set of property keys used for the vertices n the connections graph.

SQL> select /*+ parallel */ distinct k 
      from connectionsVT$;

company
show
occupation
type
team
religion
criminal charge
music genre
genre
name
role
political party
country

13 rows selected.

SQL> -- get the set of property keys used for edges
SQL> select /*+ parallel */ distinct k 
       from connectionsGE$;

weight

Example 4-5 Find Vertices with a Value

This example finds vertices with a value (of any property) that is of String type, and where and the value contains two adjacent occurrences of a, e, i, o, or u, regardless of case.n the connections graph.

SQL> select vid, t, k, v 
       from connectionsVT$ 
      where t=1 
        and regexp_like(v, '([aeiou])\1', 'i');

     6        1  name  Jordan Peele 
     6        1  show  Key and Peele
    54        1  name  John Green
        ...

It is usually hard to leverage a B-Tree index for the preceding kind of query because it is difficult to know beforehand what kind of regular expression is going to be used. For the above query, you might get the following execution plan. Note that full table scan is chosen by the optimizer.

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |    15 |   795 |    28    (0)| 00:00:01 |       |       |        |      |           |
|   1 |  PX COORDINATOR      |                |       |       |             |          |       |       |        |      |           |
|   2 |   PX SEND QC (RANDOM)| :TQ10000       |    15 |   795 |    28    (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND) |
|   3 |    PX BLOCK ITERATOR |                |    15 |   795 |    28    (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |           |
|*  4 |     TABLE ACCESS FULL| CONNECTIONSVT$ |    15 |   795 |    28    (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |           |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(INTERNAL_FUNCTION("V") AND  REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
   - Degree of Parallelism is 2 because of table property

If the Oracle Database In-Memory option is available and memory is sufficient, it can help performance to place the table (full table or a set of relevant columns) in memory. One way to achieve that is as follows:

SQL> alter table connectionsVT$ inmemory;
Table altered.

Now, entering the same SQL containing the regular expression shows a plan that performs a "TABLE ACCESS INMEMORY FULL".

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)  | Time     | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |    15 |   795 |    28     (0)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR               |                |       |       |              |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)         | :TQ10000       |    15 |   795 |    28     (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR          |                |    15 |   795 |    28     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS INMEMORY FULL| CONNECTIONSVT$ |    15 |   795 |    28     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(INTERNAL_FUNCTION("V") AND  REGEXP_LIKE ("V",U'([aeiou])\005C1','i') AND "T"=1 AND INTERNAL_FUNCTION("K"))
Note
-----
   - Degree of Parallelism is 2 because of table property

4.2 Text Queries on Property Graphs

If values of a property (vertex property or edge property) contain free text, then it might help performance to create an Oracle Text index on the V column.

Oracle Text can process text that is directly stored in the database. The text can be short strings (such as names or addresses), or it can be full-length documents. These documents can be in a variety of textual format.

The text can also be in many different languages. Oracle Text can handle any space-separated languages (including character sets such as Greek or Cyrillic). In addition, Oracle Text is able to handle the Chinese, Japanese and Korean pictographic languages)

Because the property graph feature uses NVARCHAR typed column for better support of Unicode, it is highly recommended that UTF8 (AL32UTF8) be used as the database character set.

To create an Oracle Text index on the vertices table (or edges table), the ALTER SESSION privilege is required. For example:

SQL> grant alter session to <YOUR_USER_SCHEMA_HERE>;

If customization is required, also grant the EXECUTE privilege on CTX_DDL:

SQL> grant execute on ctx_ddl to <YOUR_USER_SCHEMA_HERE>;

The following shows some example statements for granting these privileges to SCOTT.

SQL> conn / as sysdba
Connected.
SQL> -- This is a PDB setup -- 
SQL> alter session set container=orcl;
Session altered.

SQL> grant execute on ctx_ddl to scott;
Grant succeeded.

SQL> grant alter session to scott;
Grant succeeded.

Example 4-6 Create a Text Index

This example creates an Oracle Text index on the vertices table (V column) of the connections graph in the SCOTT schema. Note that unlike the text index capabilities provided by either Apache Lucene or Apache SolrCloud, the Oracle Text index created here is for all property keys, not just one or a subset of property keys. In addition, if a new property is added to the graph and the property value is of String data type, then it will automatically be included in the same text index.

The example uses the OPG_AUTO_LEXER lexer owned by MDSYS.

SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'MDSYS', lexer=>'OPG_AUTO_LEXER', dop=>2);

If customization is desired, you can use the ctx_ddl.create_preference API. For example:

SQL> -- The following requires access privilege to CTX_DDL
SQL> exec ctx_ddl.create_preference('SCOTT.OPG_AUTO_LEXER', 'AUTO_LEXER');

PL/SQL procedure successfully completed.

SQL> execute opg_apis.create_vertices_text_idx('scott', 'connections', pref_owner=>'scott', lexer=>'OPG_AUTO_LEXER', dop=>2);

PL/SQL procedure successfully completed.

You can now use a rich set of functions provided by Oracle Text to perform queries against graph elements.

Note:

If you no longer need an Oracle Text index, you can use the drop_vertices_text_idx or opg_apis.drop_edges_text_idx API to drop it. The following statements drop the text indexes on the vertices and edges of a graph named connections owned by SCOTT:

SQL> exec opg_apis.drop_vertices_text_Idx('scott', 'connections');
SQL> exec opg_apis.drop_edges_text_Idx('scott', 'connections');

Example 4-7 Find a Vertex that Has a Property Value

The following example find a vertex that has a property value (of string type) containing the keyword "Obama".

SQL> select vid, k, t, v 
       from connectionsVT$ 
      where t=1 
        and contains(v, 'Obama', 1) > 0 
      order by score(1) desc
      ;

The output and SQL execution plan from the preceding statement may appear as follows. Note that DOMAIN INDEX appears as an operation in the execution plan.

     1  name    1  Barack Obama

Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |     1 |    56 |     5  (20) | 00:00:01 |       |       |
|   1 |  SORT ORDER BY                      |                 |     1 |    56 |     5  (20) | 00:00:01 |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$  |     1 |    56 |     4    (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    DOMAIN INDEX                     | CONNECTIONSXTV$ |       |       |     4    (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T"=1 AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   3 - access("CTXSYS"."CONTAINS"("V",'Obama',1)>0)

Example 4-8 Fuzzy Match

The following example finds a vertex that has a property value (of string type) containing variants of "ameriian" (a deliberate misspelling for this example) Fuzzy match is used.

SQL> select vid, k, t, v 
       from connectionsVT$ 
      where contains(v, 'fuzzy(ameriian,,,weight)', 1) > 0 
      order by score(1) desc;

The output and SQL execution plan from the preceding statement may appear as follows.

     8 role      1  american business man
     9 role      1  american business man
     4 role      1  american economist
     6 role      1  american comedian actor
     7 role      1  american comedian actor
     1 occupation 1 44th president of United States of America

6 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1619508090

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |     1 |    56 |     5  (20)| 00:00:01 |       |       |
|   1 |  SORT ORDER BY                      |                 |     1 |    56 |     5  (20)| 00:00:01 |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$  |     1 |    56 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    DOMAIN INDEX                     | CONNECTIONSXTV$ |       |       |     4   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))

Example 4-9 Query Relaxation

The following example is a sophisticated Oracle Text query that implements query relaxation, which enables you to execute the most restrictive version of a query first, progressively relaxing the query until the required number of matches is obtained. Using query relaxation with queries that contain multiple strings, you can provide guidance for determining the “best” matches, so that these appear earlier in the results than other potential matches.

This example searchs for "american actor" with a query relaxation sequence.

SQL> select vid, k, t, v  
       from connectionsVT$ 
      where CONTAINS (v,
 '<query>
   <textquery lang="ENGLISH" grammar="CONTEXT">
     <progression>
       <seq>{american} {actor}</seq>
       <seq>{american} NEAR {actor}</seq>
       <seq>{american} AND {actor}</seq>
       <seq>{american} ACCUM {actor}</seq>
     </progression>
   </textquery>
   <score datatype="INTEGER" algorithm="COUNT"/>
  </query>') > 0;

The output and SQL execution plan from the preceding statement may appear as follows.

     7 role       1 american comedian actor
     6 role       1 american comedian actor
    44 occupation 1 actor
     8 role       1 american business man
    53 occupation 1 actor film producer
    52 occupation 1 actor
     4 role       1 american economist
    47 occupation 1 actor
     9 role       1 american business man

9 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2158361449

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |       1 |      56 |       4   (0)| 00:00:01 |         |         |
|*  1 |  TABLE ACCESS BY GLOBAL INDEX ROWID| CONNECTIONSVT$  |       1 |      56 |       4   (0)| 00:00:01 | ROWID | ROWID |
|*  2 |   DOMAIN INDEX                     | CONNECTIONSXTV$ |         |         |       4   (0)| 00:00:01 |         |         |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   2 - access("CTXSYS"."CONTAINS"("V",'<query>      <textquery lang="ENGLISH" grammar="CONTEXT">
          <progression>       <seq>{american} {actor}</seq>    <seq>{american} NEAR {actor}</seq>
          <seq>{american} AND {actor}</seq>        <seq>{american} ACCUM {actor}</seq>    </progression>
          </textquery>    <score datatype="INTEGER" algorithm="COUNT"/> </query>')>0)

Example 4-10 Find an Edge

Just as with vertices, you can create an Oracle Text index on the V column of the edges table (GE$) of a property graph. The following example uses the OPG_AUTO_LEXER lexer owned by MDSYS.

SQL> exec opg_apis.create_edges_text_idx('scott', 'connections', pref_owner=>'mdsys', lexer=>'OPG_AUTO_LEXER', dop=>4);

If customization is required, use the ctx_ddl.create_preference API.

4.3 Navigation and Graph Pattern Matching

A key benefit of using a graph data model is that you can easily navigate across entities (people, movies, products, services, events, and so on) that are modeled as vertices, following links and relationships modeled as edges. In addition, graph matching templates can be defined to do such things as detect patterns, aggregate individuals, and analyze trends.

This topic provides graph navigation and pattern matching examples using the example property graph named connections. Most of the SQL statements are relatively simple, but they can be used as building blocks to implement requirements that are more sophisticated. It is generally best to start from something simple, and progressively add complexity.

Example 4-11 Who Are a Person's Collaborators?

The following SQL ststement finds all entities that a vertex with ID 1 collaborates with. For simplicity, it considers only outgoing relationships.

SQL> select dvid, el, k, vn, v 
       from connectionsGE$ 
      where svid=1 
        and el='collaborates';

Note:

To find the specific vertex ID of interest, you can perform a text query on the property graph using keywords or fuzzy matching. (For details and examples, see Text Queries on Property Graphs.)

The preceding example's output and execution plan may be as follows.

     2 collaborates weight 1 1
    21 collaborates weight 1 1
    22 collaborates weight 1 1
      ....
    26 collaborates weight 1 1


10 rows selected.


-------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name            | Rows  | Bytes | Cost (%CPU)| Time      | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                 |    10 |   460 |     2     (0)| 00:00:01 |       |       |        |      |        |
|   1 |  PX COORDINATOR                              |                 |       |       |              |          |       |       |        |      |        |
|   2 |   PX SEND QC (RANDOM)                        | :TQ10000        |    10 |   460 |     2     (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION HASH ALL                     |                 |    10 |   460 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |        |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$  |    10 |   460 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
|*  5 |      INDEX RANGE SCAN                        | CONNECTIONSXSE$ |    20 |       |     1     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates' AND INTERNAL_FUNCTION("K") AND INTERNAL_FUNCTION("V"))
   5 - access("SVID"=1)

Example 4-12 Who Are a Person's Collaborators and What are Their Occupations?

The following SQL statement finds collaborators of the vertex with ID 1, and the occupation of each collaborator. A join with the vertices table (VT$) is required.

SQL> select dvid, vertices.v 
       from connectionsGE$, connectionsVT$ vertices 
      where svid=1 
        and el='collaborates' 
        and dvid=vertices.vid 
        and vertices.k='occupation';

The preceding example's output and execution plan may be as follows.

    21  67th United States Secretary of State
    22  68th United States Secretary of State
    23  chancellor
    28  7th president of Iran
    19  junior United States Senator from New York
...


--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |                 |     7 |   525 |     7   (0)| 00:00:01 |       |       |     |    |         |
|   1 |  PX COORDINATOR                               |                 |       |       |            |          |       |       |     |    |         |
|   2 |   PX SEND QC (RANDOM)                         | :TQ10000        |     7 |   525 |     7   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    NESTED LOOPS                               |                 |     7 |   525 |     7   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |         |
|   4 |     PX PARTITION HASH ALL                     |                 |    10 |   250 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |         |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$  |    10 |   250 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |         |
|*  6 |       INDEX RANGE SCAN                        | CONNECTIONSXSE$ |    20 |       |     1   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |         |
|   7 |     PARTITION HASH ITERATOR                   |                 |     1 |       |     0   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |         |
|*  8 |      TABLE ACCESS BY LOCAL INDEX ROWID        | CONNECTIONSVT$  |       |       |            |          |   KEY |   KEY |  Q1,00 | PCWP |         |
|*  9 |       INDEX UNIQUE SCAN                       | CONNECTIONSXQV$ |     1 |       |     0   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates')
   6 - access("SVID"=1)
   8 - filter(INTERNAL_FUNCTION("VERTICES"."V"))
   9 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'occupation')
       filter(INTERNAL_FUNCTION("VERTICES"."K"))

Example 4-13 Find a Person's Enemies and Aggregate Them by Their Country

The following SQL statement finds enemies (that is, those with the feuds relationship) of the vertex with ID 1, and aggregates them by their countries. A join with the vertices table (VT$) is required.

SQL>   select vertices.v, count(1) 
         from connectionsGE$, connectionsVT$ vertices 
        where svid=1 
          and el='feuds'  
          and dvid=vertices.vid 
          and vertices.k='country' 
     group by vertices.v;

The example's output and execution plan may be as follows. In this case, the vertex with ID 1 has 3 enemies in the United States and 1 in Russia.

United States      3
Russia             1


------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                     | Name            | Rows   | Bytes  | Cost (%CPU)| Time      | Pstart| Pstop |     TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |        |      |         |
|   1 |  PX COORDINATOR                               |                 |        |        |             |          |        |        |        |      |         |
|   2 |   PX SEND QC (RANDOM)                         | :TQ10001        |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                              |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,01 | PCWP |         |
|   4 |     PX RECEIVE                                |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,01 | PCWP |         |
|   5 |      PX SEND HASH                             | :TQ10000        |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,00 | P->P | HASH     |
|   6 |       HASH GROUP BY                           |                 |      5 |    375 |      5  (20)| 00:00:01 |        |        |  Q1,00 | PCWP |         |
|   7 |        NESTED LOOPS                           |                 |      5 |    375 |      4   (0)| 00:00:01 |        |        |  Q1,00 | PCWP |         |
|   8 |     PX PARTITION HASH ALL                     |                 |      5 |    125 |      2   (0)| 00:00:01 |      1 |      8 |  Q1,00 | PCWC |         |
|*  9 |      TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| CONNECTIONSGE$  |      5 |    125 |      2   (0)| 00:00:01 |      1 |      8 |  Q1,00 | PCWP |         |
|* 10 |       INDEX RANGE SCAN                        | CONNECTIONSXSE$ |     20 |        |      1   (0)| 00:00:01 |      1 |      8 |  Q1,00 | PCWP |         |
|  11 |     PARTITION HASH ITERATOR                   |                 |      1 |        |      0   (0)| 00:00:01 |    KEY |    KEY |  Q1,00 | PCWP |         |
|* 12 |      TABLE ACCESS BY LOCAL INDEX ROWID        | CONNECTIONSVT$  |        |        |             |          |    KEY |    KEY |  Q1,00 | PCWP |         |
|* 13 |       INDEX UNIQUE SCAN                       | CONNECTIONSXQV$ |      1 |        |      0   (0)| 00:00:01 |    KEY |    KEY |  Q1,00 | PCWP |         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   9 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'feuds')
  10 - access("SVID"=1)
  12 - filter(INTERNAL_FUNCTION("VERTICES"."V"))
  13 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country')
       filter(INTERNAL_FUNCTION("VERTICES"."K"))

Example 4-14 Find a Person's Collaborators, and aggregate and sort them

The following SQL statement finds the collaborators of the vertex with ID 1, aggregates them by their country, and sorts them in ascending order.

SQL> select vertices.v, count(1) 
      from connectionsGE$, connectionsVT$ vertices 
     where svid=1 
       and el='collaborates' 
       and dvid=vertices.vid 
       and vertices.k='country' 
  group by vertices.v 
  order by count(1) asc;

The example output and execution plan may be as follows. In this case, the vertex with ID 1 has the most collaborators in the United States.

Germany        1
Japan          1
Iran           1
United States  7


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                         | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ    |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                  |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |        |      |        |
|   1 |  PX COORDINATOR                                   |                 |       |       |              |          |       |       |        |      |        |
|   2 |   PX SEND QC (ORDER)                              | :TQ10002        |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,02 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                                  |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,02 | PCWP |        |
|   4 |     PX RECEIVE                                    |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,02 | PCWP |        |
|   5 |      PX SEND RANGE                                | :TQ10001        |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,01 | P->P | RANGE  |
|   6 |       HASH GROUP BY                               |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,01 | PCWP |        |
|   7 |        PX RECEIVE                                 |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,01 | PCWP |        |
|   8 |     PX SEND HASH                                  | :TQ10000        |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,00 | P->P | HASH   |
|   9 |      HASH GROUP BY                                |                 |    10 |   750 |     9    (23)| 00:00:01 |       |       |  Q1,00 | PCWP |        |
|  10 |       NESTED LOOPS                                |                 |    10 |   750 |     7     (0)| 00:00:01 |       |       |  Q1,00 | PCWP |        |
|  11 |        PX PARTITION HASH ALL                      |                 |    10 |   250 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |        |
|* 12 |         TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | CONNECTIONSGE$  |    10 |   250 |     2     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
|* 13 |          INDEX RANGE SCAN                         | CONNECTIONSXSE$ |    20 |       |     1     (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |        |
|  14 |        PARTITION HASH ITERATOR                    |                 |     1 |       |     0     (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |        |
|* 15 |         TABLE ACCESS BY LOCAL INDEX ROWID         | CONNECTIONSVT$  |       |       |        |                |   KEY |   KEY |  Q1,00 | PCWP |        |
|* 16 |          INDEX UNIQUE SCAN                        | CONNECTIONSXQV$ |     1 |       |     0     (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |        |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  12 - filter(INTERNAL_FUNCTION("EL") AND "EL"=U'collaborates')
  13 - access("SVID"=1)
  15 - filter(INTERNAL_FUNCTION("VERTICES"."V"))
  16 - access("DVID"="VERTICES"."VID" AND "VERTICES"."K"=U'country')
       filter(INTERNAL_FUNCTION("VERTICES"."K"))

4.4 Navigation Options: CONNECT BY and Parallel Recursion

The CONNECT BY clause and parallel recursion provide options for advanced navigation and querying.

CONNECT BY lets you navigate and find matches in a hierarchical order. To follow outgoing edges, you can use prior dvid = svid to guide the navigation.
Parallel recursion lets you perform navigation up to a specified number of hops away.

The examples use a property graph named connections.

Example 4-15 CONNECT WITH

The following SQL statement follows the outgoing edges by 1 hop.

SQL> select G.dvid
       from connectionsGE$ G
      start with svid = 1
    connect by nocycle prior dvid = svid and level <= 1;

The preceding example's output and execution plan may be as follows.

         2
         3
         4
         5
         6
         7
         8
         9
        10
        ...
------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |     7 |   273 |     3  (67)| 00:00:01 |       |       |        |      |            |
|*  1 |  CONNECT BY WITH FILTERING|                 |       |       |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR          |                 |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000        |     2 |    12 |     0   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX PARTITION HASH ALL |                 |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|*  5 |      INDEX RANGE SCAN     | CONNECTIONSXSE$ |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
|*  6 |   FILTER                  |                 |       |       |            |          |       |       |        |      |            |
|   7 |    NESTED LOOPS           |                 |     5 |    95 |     1   (0)| 00:00:01 |       |       |        |      |            |
|   8 |     CONNECT BY PUMP       |                 |       |       |            |          |       |       |        |      |            |
|   9 |     PARTITION HASH ALL    |                 |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |        |      |            |
|* 10 |      INDEX RANGE SCAN     | CONNECTIONSXSE$ |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |        |      |            |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("SVID"=PRIOR "DVID")
       filter(LEVEL<=2)
   5 - access("SVID"=1)
   6 - filter(LEVEL<=2)
  10 - access("connect$_by$_pump$_002"."prior dvid "="SVID")

To extend from 1 hop to multiple hops, change 1 in the preceding example to another integer. For example, to change it to 2 hops, specify: level <= 2

Example 4-16 Parallel Recursion

The following SQL statement uses recursion within the WITH clause to perform navigation up to 4 hops away, a using recursively defined graph expansion: g_exp references g_exp in the query, and that defines the recursion. The example also uses the PARALLEL optimizer hint for parallel execution.

SQL> WITH g_exp(svid, dvid, depth) as
  ( 
    select svid as svid, dvid as dvid, 0 as depth
      from connectionsGE$
     where svid=1
   union all
     select g2.svid,  g1.dvid, g2.depth + 1
       from g_exp g2, connectionsGE$ g1
      where g2.dvid=g1.svid
        and g2.depth <= 3
  )   
select  /*+ parallel(4) */ dvid, depth
  from  g_exp
 where svid=1
;

The example's output and execution plan may be as follows. Note that CURSOR DURATION MEMORY is chosen in the execution, which indicates the graph expansion stores the intermediate data in memory.

        22          4
        25          4
        24          4
         1          4

        23          4
        33          4
        22          4
        22          4
       ...         ...


Execution Plan
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                            |   801 | 31239 |   147   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  TEMP TABLE TRANSFORMATION                   |                            |       |       |            |          |       |       |        |      |            |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)    | SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |        |      |            |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST  |                            |       |       |            |          |       |       |        |      |            |
|   4 |     PX COORDINATOR                           |                            |       |       |            |          |       |       |        |      |            |
|   5 |      PX SEND QC (RANDOM)                     | :TQ20000                   |     2 |    12 |     0   (0)| 00:00:01 |       |       |  Q2,00 | P->S | QC (RAND)  |
|   6 |       LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |  Q2,00 | PCWP |            |
|   7 |        PX PARTITION HASH ALL                 |                            |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q2,00 | PCWC |            |
|*  8 |         INDEX RANGE SCAN                     | CONNECTIONSXSE$            |     2 |    12 |     0   (0)| 00:00:01 |     1 |     8 |  Q2,00 | PCWP |            |
|   9 |     PX COORDINATOR                           |                            |       |       |            |          |       |       |        |      |            |
|  10 |      PX SEND QC (RANDOM)                     | :TQ10000                   |   799 |    12M|    12   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|  11 |       LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|* 12 |        HASH JOIN                             |                            |   799 |    12M|    12   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  13 |         BUFFER SORT (REUSE)                  |                            |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|  14 |          PARTITION HASH ALL                  |                            |   164 |   984 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWC |            |
|  15 |           INDEX FAST FULL SCAN               | CONNECTIONSXDE$            |   164 |   984 |     2   (0)| 00:00:01 |     1 |     8 |  Q1,00 | PCWP |            |
|  16 |         PX BLOCK ITERATOR                    |                            |       |       |            |          |       |       |  Q1,00 | PCWC |            |
|* 17 |          TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6614_11CB2D2 |       |       |            |          |       |       |  Q1,00 | PCWP |            |
|  18 |   PX COORDINATOR                             |                            |       |       |            |          |       |       |        |      |            |
|  19 |    PX SEND QC (RANDOM)                       | :TQ30000                   |   801 | 31239 |   135   (0)| 00:00:01 |       |       |  Q3,00 | P->S | QC (RAND)  |
|* 20 |     VIEW                                     |                            |   801 | 31239 |   135   (0)| 00:00:01 |       |       |  Q3,00 | PCWP |            |
|  21 |      PX BLOCK ITERATOR                       |                            |   801 |    12M|   135   (0)| 00:00:01 |       |       |  Q3,00 | PCWC |            |
|  22 |       TABLE ACCESS FULL                      | SYS_TEMP_0FD9D6614_11CB2D2 |   801 |    12M|   135   (0)| 00:00:01 |       |       |  Q3,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("SVID"=1)
  12 - access("G2"."DVID"="G1"."SVID")
  17 - filter("G2"."INTERNAL_ITERS$"=LEVEL AND "G2"."DEPTH"<=3)
  20 - filter("SVID"=1)

4.5 Pivot

The PIVOT clause lets you dynamically add columns to a table to create a new table.

The schema design (VT$ and GE$) of the property graph is narrow ("skinny") rather than wide ("fat"). This means that if a vertex or edge has multiple properties, those property keys, values, data types, and so on will be stored using multiple rows instead of multiple columns. Such a design is very flexible in the sense that you can add properties dynamically without having to worry about adding too many columns or even reaching the physical maximum limit of number of columns a table may have. However, for some applications you may prefer to have a wide table if the properties are somewhat homogeneous.

Example 4-17 Pivot

The following CREATE TABLE ... AS SELECT statement uses PIVOT to add four columns: ‘company’,’ occupation’,’ name’, and ‘religion’.

SQL> CREATE TABLE table pg_wide
as
 with G AS (select vid, k, t, v
              from connectionsVT$
           )
 select *
   from G
  pivot (
    min(v) for k in ('company', 'occupation', 'name', 'religion')
  );

Table created.

The following DESCRIBE statement shows the definition of the new table, including the four added columns. (The output is reformatted for readability.)

SQL> DESCRIBE pg_wide;
 Name                                                  Null?    Type
--------------------------------------------------- -------- --------------------
 VID                                                   NOT NULL NUMBER
 T                                                              NUMBER(38)
 'company'                                                      NVARCHAR2(15000)
 'occupation'                                                   NVARCHAR2(15000)
 'name'                                                         NVARCHAR2(15000)
 'religion'                                                     NVARCHAR2(15000)

4.6 SQL-Based Property Graph Analytics

In addition to the analytical functions offered by the in-memory analyst, the property graph feature in Oracle Spatial and Graph supports several native, SQL-based property graph analytics.

SQL-based analytics are especially useful if:

The computation is not bound by the physical memory.
There is no need to move the graph data outside the database.
Computation is easily done against the latest snapshot of a property graph.
There is no need to 'sync up' (synchronize) data inside and outside the database.

However, when a graph (or a subgraph) fits in memory, then running analytics provided by the in-memory analyst usually provides better better performance than using SQL-based analytics. Because many of the analytics implementation require using intermediate data structures, most SQL- (PL/SQL-) based analytics APIs have parameters for working tables (wt). A typical flow has the following steps:

Prepare the working table or tables.
Perform analytics (one or multiple calls).
Perform cleanup

Example 4-18 Shortest Path

Consider shortest path, for example. Internally, Oracle Database uses the bidirectional Dijkstra algorithm. The following code snippet shows an end-to-end flow.

set serveroutput on
  
DECLARE   
  wt1 varchar2(100);  -- intermediate working tables
  n number;
  path    varchar2(1000);
  weights varchar2(1000);
BEGIN
  -- prepare
  opg_apis.find_sp_prep('connectionsGE$', wt1);
  dbms_output.put_line('working table name    ' || wt1);

  -- compute
  opg_apis.find_sp(
     'connectionsGE$',
      1,                          -- start vertex ID
      53,                         -- destination vertex ID
      wt1,                        -- working table (for Dijkstra expansion)
      dop => 1,                   -- degree of parallelism
      stats_freq=>1000,           -- frequency to collect statistics
      path_output => path,        -- shortest path (a sequence of vertices)
      weights_output => weights,  -- edge weights
      options => null
      );
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- cleanup (commented out here; see text after the example)
  -- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/

This example may produce the following output. Note that if no working table name is provided, the preparation step will automatically generate a temporary table name and create it. Because the temporary working table name uses the session ID, your output will probably be different.

working table name    "CONNECTIONSGE$$TWFS12"
path    1 3    52 53
weights 4 3 1    1 1

PL/SQL procedure successfully completed.

If you want to know the definition of the working table or tables, then skip the cleanup phase (as shown in the preceding example that comments out the call to find_sp_cleanup). After the computation is done, you can describe the working table or tables.

SQL> describe "CONNECTIONSGE$$TWFS12"
 Name                Null?    Type
 --------- -------- ----------------------------
 NID                            NUMBER
 D2S                            NUMBER
 P2S                            NUMBER
 D2T                            NUMBER
 P2T                            NUMBER
 F                            NUMBER(38)
 B                            NUMBER(38)

For advanced users who want to try different table creation options, such as using inmemory or advanced compression, you can pre-create the preceding working table and pass the name in.

Example 4-19 Create Working Table and Perform Analytics

The following statements first create a working table with the same column structure and basic compression enabled, then pass it to the computation.

create table connections$MY_EXP(
 NID                            NUMBER,
 D2S                            NUMBER,
 P2S                            NUMBER,
 D2T                            NUMBER,
 P2T                            NUMBER,
 F                            NUMBER(38),
 B                            NUMBER(38)
) compress nologging;


DECLARE
  wt1 varchar2(100) := 'connections$MY_EXP';
  n number;
  path    varchar2(1000);
  weights varchar2(1000);
BEGIN
  dbms_output.put_line('working table name    ' || wt1);

  -- compute
  opg_apis.find_sp(
     'connectionsGE$',
      1,
      53,
      wt1,
      dop => 1,
      stats_freq=>1000,
      path_output => path,
      weights_output => weights,
      options => null
      );
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- cleanup
  -- opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/

At the end of the computation, if the working table has not been dropped or truncated, you can check the content of the working table, as follows. Note that the working table structure may vary between releases.

SQL> select * from connections$MY_EXP;
       NID        D2S        P2S        D2T        P2T          F          B
---------- ---------- ---------- ---------- ---------- ---------- ----------
         1          0            1.000E+100                     1         -1
        53 1.000E+100                     0                    -1          1
        54 1.000E+100                     1         53         -1          1
        52 1.000E+100                     1         53         -1          1
         5          1          1 1.000E+100                     0         -1
        26          1          1 1.000E+100                     0         -1
         8       1000          1 1.000E+100                     0         -1
         3          1          1          2         52          0          0
        15          1          1 1.000E+100                     0         -1
        21          1          1 1.000E+100                     0         -1
        19          1          1 1.000E+100                     0         -1
       ...

Example 4-20 Multiple Calls to Same Graph

To perform multiple calls to the same graph, only a single call to the preparation step is needed. The following shows an example of computing shortest path for multiple pairs of vertices.

DECLARE
  wt1 varchar2(100);  -- intermediate working tables
  n number;
  path    varchar2(1000);
  weights varchar2(1000);
BEGIN
  -- prepare
  opg_apis.find_sp_prep('connectionsGE$', wt1);
  dbms_output.put_line('working table name    ' || wt1);

  -- find shortest path from vertex 1 to vertex 53
  opg_apis.find_sp( 'connectionsGE$', 1, 53,
      wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- find shortest path from vertex 2 to vertex 36
  opg_apis.find_sp( 'connectionsGE$', 2, 36,
      wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- find shortest path from vertex 30 to vertex 4
  opg_apis.find_sp( 'connectionsGE$', 30, 4,
      wt1, dop => 1, stats_freq=>1000, path_output => path, weights_output => weights, options => null);
  dbms_output.put_line('path    ' || path);
  dbms_output.put_line('weights ' || weights);

  -- cleanup
  opg_apis.find_sp_cleanup('connectionsGE$', wt1);
END;
/

The example's output may be as follows: three shortest paths have been found for the multiple pairs of vertices provided.

working table name    "CONNECTIONSGE$$TWFS12"
path    1 3    52 53
weights 4 3 1   1 1
path    2    36
weights 2 1   1
path    30 21    1 4
weights 4 3 1   1 1

PL/SQL procedure successfully completed.