26 Scoring and Deployment
Explains the scoring and deployment features of Oracle Data Mining.
26.1 About Scoring and Deployment
Scoring is the application of models to new data. In Oracle Data Mining, scoring is performed by SQL language functions.
Predictive functions perform Classification, Regression, or Anomaly detection. Clustering functions assign rows to clusters. Feature Extraction functions transform the input data to a set of higher order predictors. A scoring procedure is also available in the DBMS_DATA_MINING
PL/SQL package.
Deployment refers to the use of models in a target environment. Once the models have been built, the challenges come in deploying them to obtain the best results, and in maintaining them within a production environment. Deployment can be any of the following:
-
Scoring data either for batch or real-time results. Scores can include predictions, probabilities, rules, and other statistics.
-
Extracting model details to produce reports. For example: clustering rules, decision tree rules, or attribute rankings from an Attribute Importance model.
-
Extending the business intelligence infrastructure of a data warehouse by incorporating mining results in applications or operational systems.
-
Moving a model from the database where it was built to the database where it used for scoring (export/import)
Oracle Data Mining supports all of these deployment scenarios.
Note:
Oracle Data Mining scoring operations support parallel execution. When parallel execution is enabled, multiple CPU and I/O resources are applied to the execution of a single database operation.
Parallel execution offers significant performance improvements, especially for operations that involve complex queries and large databases typically associated with decision support systems (DSS) and data warehouses.
26.2 Using the Data Mining SQL Functions
Learn about the benefits of SQL functions in data mining.
The data mining SQL functions provide the following benefits:
-
Models can be easily deployed within the context of existing SQL applications.
-
Scoring operations take advantage of existing query execution functionality. This provides performance benefits.
-
Scoring results are pipelined, enabling the rows to be processed without requiring materialization.
The data mining functions produce a score for each row in the selection. The functions can apply a mining model schema object to compute the score, or they can score dynamically without a pre-defined model, as described in "Dynamic Scoring".
26.2.1 Choosing the Predictors
The data mining functions support a USING
clause that specifies which attributes to use for scoring. You can specify some or all of the attributes in the selection and you can specify expressions. The following examples all use the PREDICTION
function to find the customers who are likely to use an affinity card, but each example uses a different set of predictors.
The query in Example 26-1 uses all the predictors.
The query in Example 26-2 uses only gender, marital status, occupation, and income as predictors.
The query in Example 26-3 uses three attributes and an expression as predictors. The prediction is based on gender, marital status, occupation, and the assumption that all customers are in the highest income bracket.
Example 26-1 Using All Predictors
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(dt_sh_clas_sample USING *) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - ---------- ---------- F 25 38 M 213 43
Example 26-2 Using Some Predictors
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(dt_sh_clas_sample USING cust_gender,cust_marital_status, occupation, cust_income_level) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - ---------- ---------- F 30 38 M 186 43
Example 26-3 Using Some Predictors and an Expression
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(dt_sh_clas_sample USING cust_gender, cust_marital_status, occupation, 'L: 300,000 and above' AS cust_income_level) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - ---------- ---------- F 30 38 M 186 43
26.2.2 Single-Record Scoring
The data mining functions can produce a score for a single record, as shown in Example 26-4 and Example 26-5.
Example 26-4 returns a prediction for customer 102001 by applying the classification model NB_SH_Clas_sample
. The resulting score is 0, meaning that this customer is unlikely to use an affinity card.
Example 26-5 returns a prediction for 'Affinity card is great' as the comments attribute by applying the text mining model T_SVM_Clas_sample
. The resulting score is 1, meaning that this customer is likely to use an affinity card.
Example 26-4 Scoring a Single Customer or a Single Text Expression
SELECT PREDICTION (NB_SH_Clas_Sample USING *) FROM sh.customers where cust_id = 102001; PREDICTION(NB_SH_CLAS_SAMPLEUSING*) ----------------------------------- 0
Example 26-5 Scoring a Single Text Expression
SELECT PREDICTION(T_SVM_Clas_sample USING 'Affinity card is great' AS comments) FROM DUAL; PREDICTION(T_SVM_CLAS_SAMPLEUSING'AFFINITYCARDISGREAT'ASCOMMENTS) ----------------------------------------------------------------- 1
26.3 Prediction Details
Prediction details are XML strings that provide information about the score. Details are available for all types of scoring: clustering, feature extraction, classification, regression, and anomaly detection. Details are available whether scoring is dynamic or the result of model apply.
The details functions, CLUSTER_DETAILS
, FEATURE_DETAILS
, and PREDICTION_DETAILS
return the actual value of attributes used for scoring and the relative importance of the attributes in determining the score. By default, the functions return the five most important attributes in descending order of importance.
26.3.1 Cluster Details
For the most likely cluster assignments of customer 100955 (probability of assignment > 20%), the query in the following example produces the five attributes that have the most impact for each of the likely clusters. The clustering functions apply an Expectation Maximization model named em_sh_clus_sample
to the data selected from mining_data_apply_v
. The "5" specified in CLUSTER_DETAILS
is not required, because five attributes are returned by default.
Example 26-6 Cluster Details
SELECT S.cluster_id, probability prob, CLUSTER_DETAILS(em_sh_clus_sample, S.cluster_id, 5 USING T.*) det FROM (SELECT v.*, CLUSTER_SET(em_sh_clus_sample, NULL, 0.2 USING *) pset FROM mining_data_apply_v v WHERE cust_id = 100955) T, TABLE(T.pset) S ORDER BY 2 DESC; CLUSTER_ID PROB DET ---------- ----- ---------------------------------------------------------------------------- 14 .6761 <Details algorithm="Expectation Maximization" cluster="14"> <Attribute name="AGE" actualValue="51" weight=".676" rank="1"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".557" rank="2"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".412" rank="3"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".171" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION"actualValue="1" weight="-.003" rank="5"/> </Details> 3 .3227 <Details algorithm="Expectation Maximization" cluster="3"> <Attribute name="YRS_RESIDENCE" actualValue="3" weight=".323" rank="1"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".265" rank="2"/> <Attribute name="EDUCATION" actualValue="HS-grad" weight=".172" rank="3"/> <Attribute name="AFFINITY_CARD" actualValue="0" weight=".125" rank="4"/> <Attribute name="OCCUPATION" actualValue="Crafts" weight=".055" rank="5"/> </Details>
26.3.2 Feature Details
The query in the following example returns the three attributes that have the greatest impact on the top Principal Components Analysis (PCA) projection for customer 101501. The FEATURE_DETAILS
function applies a Singular Value Decomposition model named svd_sh_sample
to the data selected from svd_sh_sample_build_num
.
Example 26-7 Feature Details
SELECT FEATURE_DETAILS(svd_sh_sample, 1, 3 USING *) proj1det FROM svd_sh_sample_build_num WHERE CUST_ID = 101501; PROJ1DET -------------------------------------------------------------------------------- <Details algorithm="Singular Value Decomposition" feature="1"> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".352" rank="1"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".249" rank="2"/> <Attribute name="AGE" actualValue="41" weight=".063" rank="3"/> </Details>
26.3.3 Prediction Details
The query in the following example returns the attributes that are most important in predicting the age of customer 100010. The prediction functions apply a Generalized Linear Model Regression model named GLMR_SH_Regr_sample
to the data selected from mining_data_apply_v
.
Example 26-8 Prediction Details for Regression
SELECT cust_id, PREDICTION(GLMR_SH_Regr_sample USING *) pr, PREDICTION_DETAILS(GLMR_SH_Regr_sample USING *) pd FROM mining_data_apply_v WHERE CUST_ID = 100010; CUST_ID PR PD ------- ----- ----------- 100010 25.45 <Details algorithm="Generalized Linear Model"> <Attribute name="FLAT_PANEL_MONITOR" actualValue="1" weight=".025" rank="1"/> <Attribute name="OCCUPATION" actualValue="Crafts" weight=".019" rank="2"/> <Attribute name="AFFINITY_CARD" actualValue="0" weight=".01" rank="3"/> <Attribute name="OS_DOC_SET_KANJI" actualValue="0" weight="0" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight="-.004" rank="5"/> </Details>
The query in the following example returns the customers who work in Tech Support and are likely to use an affinity card (with more than 85% probability). The prediction functions apply an Support Vector Machine (SVM) Classification model named svmc_sh_clas_sample
. to the data selected from mining_data_apply_v
. The query includes the prediction details, which show that education is the most important predictor.
Example 26-9 Prediction Details for Classification
SELECT cust_id, PREDICTION_DETAILS(svmc_sh_clas_sample, 1 USING *) PD FROM mining_data_apply_v WHERE PREDICTION_PROBABILITY(svmc_sh_clas_sample, 1 USING *) > 0.85 AND occupation = 'TechSup' ORDER BY cust_id; CUST_ID PD ------- --------------------------------------------------------------------------------------- 100029 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Assoc-A" weight=".199" rank="1"/> <Attribute name="CUST_INCOME_LEVEL" actualValue="I: 170\,000 - 189\,999" weight=".044" rank="2"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".028" rank="3"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".024" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".022" rank="5"/> </Details> 100378 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Assoc-A" weight=".21" rank="1"/> <Attribute name="CUST_INCOME_LEVEL" actualValue="B: 30\,000 - 49\,999" weight=".047" rank="2"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".043" rank="3"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".03" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".023" rank="5"/> </Details> 100508 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Bach." weight=".19" rank="1"/> <Attribute name="CUST_INCOME_LEVEL" actualValue="L: 300\,000 and above" weight=".046" rank="2"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".031" rank="3"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".026" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".024" rank="5"/> </Details> 100980 <Details algorithm="Support Vector Machines" class="1"> <Attribute name="EDUCATION" actualValue="Assoc-A" weight=".19" rank="1"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="0" weight=".038" rank="2"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".026" rank="3"/> <Attribute name="BULK_PACK_DISKETTES" actualValue="1" weight=".022" rank="4"/> <Attribute name="BOOKKEEPING_APPLICATION" actualValue="1" weight=".02" rank="5"/> </Details>
The query in the following example returns the two customers that differ the most from the rest of the customers. The prediction functions apply an anomaly detection model named SVMO_SH_Clas_sample
to the data selected from mining_data_apply_v
. Anomaly Detection uses a one-class SVM classifier.
Example 26-10 Prediction Details for Anomaly Detection
SELECT cust_id, pd FROM (SELECT cust_id, PREDICTION_DETAILS(SVMO_SH_Clas_sample, 0 USING *) pd, RANK() OVER (ORDER BY prediction_probability( SVMO_SH_Clas_sample, 0 USING *) DESC, cust_id) rnk FROM mining_data_one_class_v) WHERE rnk <= 2 ORDER BY rnk; CUST_ID PD ---------- ----------------------------------------------------------------------------------- 102366 <Details algorithm="Support Vector Machines" class="0"> <Attribute name="COUNTRY_NAME" actualValue="United Kingdom" weight=".078" rank="1"/> <Attribute name="CUST_MARITAL_STATUS" actualValue="Divorc." weight=".027" rank="2"/> <Attribute name="CUST_GENDER" actualValue="F" weight=".01" rank="3"/> <Attribute name="HOUSEHOLD_SIZE" actualValue="9+" weight=".009" rank="4"/> <Attribute name="AGE" actualValue="28" weight=".006" rank="5"/> </Details> 101790 <Details algorithm="Support Vector Machines" class="0"> <Attribute name="COUNTRY_NAME" actualValue="Canada" weight=".068" rank="1"/> <Attribute name="HOUSEHOLD_SIZE" actualValue="4-5" weight=".018" rank="2"/> <Attribute name="EDUCATION" actualValue="7th-8th" weight=".015" rank="3"/> <Attribute name="CUST_GENDER" actualValue="F" weight=".013" rank="4"/> <Attribute name="AGE" actualValue="38" weight=".001" rank="5"/> </Details>
26.3.4 GROUPING Hint
Data mining functions consist of SQL functions such as PREDICTION*
, CLUSTER*
, FEATURE*
, and ORA_DM_*
. The GROUPING
hint is an optional hint which applies to data mining scoring functions when scoring partitioned models.
Enhanced PREDICTION Function Command Format
This hint results in partitioning the input data set into distinct data slices so that each partition is scored in its entirety before advancing to the next partition. However, parallelism by partition is still available. Data slices are determined by the partitioning key columns used when the model was built. This method can be used with any data mining function against a partitioned model. The hint may yield a query performance gain when scoring large data that is associated with many partitions but may negatively impact performance when scoring large data with few partitions on large systems. Typically, there is no performance gain if you use the hint for single row queries.
<prediction function> ::=
PREDICTION <left paren> /*+ GROUPING */ <prediction model>
[ <comma> <class value> [ <comma> <top N> ] ]
USING <mining attribute list> <right paren>
The syntax for only the PREDICTION
function is given but it is applicable to any Data mining function where PREDICTION
, CLUSTERING
, and FEATURE_EXTRACTION
scoring functions occur.
Example 26-11 Example
SELECT PREDICTION(/*+ GROUPING */my_model USING *) pred FROM <input table>;
Related Topics
26.4 Real-Time Scoring
Oracle Data Mining SQL functions enable prediction, clustering, and feature extraction analysis to be easily integrated into live production and operational systems. Because mining results are returned within SQL queries, mining can occur in real time.
With real-time scoring, point-of-sales database transactions can be mined. Predictions and rule sets can be generated to help front-line workers make better analytical decisions. Real-time scoring enables fraud detection, identification of potential liabilities, and recognition of better marketing and selling opportunities.
The query in the following example uses a Decision Tree model named dt_sh_clas_sample
to predict the probability that customer 101488 uses an affinity card. A customer representative can retrieve this information in real time when talking to this customer on the phone. Based on the query result, the representative can offer an extra-value card, since there is a 73% chance that the customer uses a card.
26.5 Dynamic Scoring
The Data Mining SQL functions operate in two modes: by applying a pre-defined model, or by executing an analytic clause. If you supply an analytic clause instead of a model name, the function builds one or more transient models and uses them to score the data.
The ability to score data dynamically without a pre-defined model extends the application of basic embedded data mining techniques into environments where models are not available. Dynamic scoring, however, has limitations. The transient models created during dynamic scoring are not available for inspection or fine tuning. Applications that require model inspection, the correlation of scoring results with the model, special algorithm settings, or multiple scoring queries that use the same model, require a predefined model.
The following example shows a dynamic scoring query. The example identifies the rows in the input data that contain unusual customer age values.
Example 26-13 Dynamic Prediction
SELECT cust_id, age, pred_age, age-pred_age age_diff, pred_det FROM (SELECT cust_id, age, pred_age, pred_det, RANK() OVER (ORDER BY ABS(age-pred_age) DESC) rnk FROM (SELECT cust_id, age, PREDICTION(FOR age USING *) OVER () pred_age, PREDICTION_DETAILS(FOR age ABS USING *) OVER () pred_det FROM mining_data_apply_v)) WHERE rnk <= 5; CUST_ID AGE PRED_AGE AGE_DIFF PRED_DET ------- ---- ---------- -------- -------------------------------------------------------------- 100910 80 40.6686505 39.33 <Details algorithm="Support Vector Machines"> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".059" rank="1"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059" rank="2"/> <Attribute name="AFFINITY_CARD" actualValue="0" weight=".059" rank="3"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="1" weight=".059" rank="4"/> <Attribute name="YRS_RESIDENCE" actualValue="4" weight=".059" rank="5"/> </Details> 101285 79 42.1753571 36.82 <Details algorithm="Support Vector Machines"> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".059" rank="1"/> <Attribute name="HOUSEHOLD_SIZE" actualValue="2" weight=".059" rank="2"/> <Attribute name="CUST_MARITAL_STATUS" actualValue="Mabsent" weight=".059" rank="3"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059" rank="4"/> <Attribute name="OCCUPATION" actualValue="Prof." weight=".059" rank="5"/> </Details> 100694 77 41.0396722 35.96 <Details algorithm="Support Vector Machines"> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".059" rank="1"/> <Attribute name="EDUCATION" actualValue="< Bach." weight=".059" rank="2"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059" rank="3"/> <Attribute name="CUST_ID" actualValue="100694" weight=".059" rank="4"/> <Attribute name="COUNTRY_NAME" actualValue="United States of America" weight=".059" rank="5"/> </Details> 100308 81 45.3252491 35.67 <Details algorithm="Support Vector Machines"> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".059" rank="1"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059" rank="2"/> <Attribute name="HOUSEHOLD_SIZE" actualValue="2" weight=".059" rank="3"/> <Attribute name="FLAT_PANEL_MONITOR" actualValue="1" weight=".059" rank="4"/> <Attribute name="CUST_GENDER" actualValue="F" weight=".059" rank="5"/> </Details> 101256 90 54.3862214 35.61 <Details algorithm="Support Vector Machines"> <Attribute name="YRS_RESIDENCE" actualValue="9" weight=".059" rank="1"/> <Attribute name="HOME_THEATER_PACKAGE" actualValue="1" weight=".059" rank="2"/> <Attribute name="EDUCATION" actualValue="< Bach." weight=".059" rank="3"/> <Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059" rank="4"/> <Attribute name="COUNTRY_NAME" actualValue="United States of America" weight=".059" rank="5"/> </Details>
26.6 Cost-Sensitive Decision Making
Costs are user-specified numbers that bias Classification. The algorithm uses positive numbers to penalize more expensive outcomes over less expensive outcomes. Higher numbers indicate higher costs.
The algorithm uses negative numbers to favor more beneficial outcomes over less beneficial outcomes. Lower negative numbers indicate higher benefits.
All classification algorithms can use costs for scoring. You can specify the costs in a cost matrix table, or you can specify the costs inline when scoring. If you specify costs inline and the model also has an associated cost matrix, only the inline costs are used. The PREDICTION
, PREDICTION_SET
, and PREDICTION_COST
functions support costs.
Only the Decision Tree algorithm can use costs to bias the model build. If you want to create a Decision Tree model with costs, create a cost matrix table and provide its name in the CLAS_COST_TABLE_NAME
setting for the model. If you specify costs when building the model, the cost matrix used to create the model is used when scoring. If you want to use a different cost matrix table for scoring, first remove the existing cost matrix table then add the new one.
A sample cost matrix table is shown in the following table. The cost matrix specifies costs for a binary target. The matrix indicates that the algorithm must treat a misclassified 0 as twice as costly as a misclassified 1.
Table 26-1 Sample Cost Matrix
ACTUAL_TARGET_VALUE | PREDICTED_TARGET_VALUE | COST |
---|---|---|
0 |
0 |
0 |
0 |
1 |
2 |
1 |
0 |
1 |
1 |
1 |
0 |
Example 26-14 Sample Queries With Costs
The table nbmodel_costs
contains the cost matrix described in Table 26-1.
SELECT * from nbmodel_costs; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE COST ------------------- ---------------------- ---------- 0 0 0 0 1 2 1 0 1 1 1 0
The following statement associates the cost matrix with a Naive Bayes model called nbmodel.
BEGIN dbms_data_mining.add_cost_matrix('nbmodel', 'nbmodel_costs'); END; /
The following query takes the cost matrix into account when scoring mining_data_apply_v
. The output is restricted to those rows where a prediction of 1 is less costly then a prediction of 0.
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION (nbmodel COST MODEL USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - ---------- ---------- F 25 38 M 208 43
You can specify costs inline when you invoke the scoring function. If you specify costs inline and the model also has an associated cost matrix, only the inline costs are used. The same query is shown below with different costs specified inline. Instead of the "2" shown in the cost matrix table (Table 26-1), "10" is specified in the inline costs.
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION (nbmodel COST (0,1) values ((0, 10), (1, 0)) USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - ---------- ---------- F 74 39 M 581 43
The same query based on probability instead of costs is shown below.
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION (nbmodel USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - ---------- ---------- F 73 39 M 577 44
26.7 DBMS_DATA_MINING.Apply
The APPLY
procedure in DBMS_DATA_MINING
is a batch apply operation that writes the results of scoring directly to a table.
The columns in the table are mining function-dependent.
Scoring with APPLY
generates the same results as scoring with the SQL scoring functions. Classification produces a prediction and a probability for each case; clustering produces a cluster ID and a probability for each case, and so on. The difference lies in the way that scoring results are captured and the mechanisms that can be used for retrieving them.
APPLY
creates an output table with the columns shown in the following table:
Table 26-2 APPLY Output Table
Mining Function | Output Columns |
---|---|
classification |
|
regression |
|
anomaly detection |
|
clustering |
|
feature extraction |
|
Since APPLY
output is stored separately from the scoring data, it must be joined to the scoring data to support queries that include the scored rows. Thus any model that is used with APPLY
must have a case ID.
A case ID is not required for models that is applied with SQL scoring functions. Likewise, storage and joins are not required, since scoring results are generated and consumed in real time within a SQL query.
The following example illustrates Anomaly Detection with APPLY
. The query of the APPLY
output table returns the ten first customers in the table. Each has a a probability for being typical (1) and a probability for being anomalous (0).
Example 26-15 Anomaly Detection with DBMS_DATA_MINING.APPLY
EXEC dbms_data_mining.apply ('SVMO_SH_Clas_sample','svmo_sh_sample_prepared', 'cust_id', 'one_class_output'); SELECT * from one_class_output where rownum < 11; CUST_ID PREDICTION PROBABILITY ---------- ---------- ----------- 101798 1 .567389309 101798 0 .432610691 102276 1 .564922469 102276 0 .435077531 102404 1 .51213544 102404 0 .48786456 101891 1 .563474346 101891 0 .436525654 102815 0 .500663683 102815 1 .499336317
Related Topics