12 Performing Sentiment Analysis Using Oracle Text
Sentiment analysis enables you to identify a positive or negative sentiment with regard to a search topic.
This chapter contains the following topics:
12.1 Overview of Sentiment Analysis
Sentiment analysis uses trained sentiment classifiers to provide sentiment information for documents or topics within documents.
This section contains the following topics:
12.1.1 About Sentiment Analysis
Oracle Text enables you to perform sentiment analysis for a topic or document by using sentiment classifiers that are trained to identify sentiment metadata.
With growing amounts of data, it would be beneficial if organizations could gain more insights into their data rather than just obtaining “hits” in response to a search query. The insight could be in the form of answering certain basic types of queries (such as weather queries or queries about recent events) or providing opinions about a user-specified topic. Keyword searches provide a list of results containing the search term. However, to identify a sentiment or opinion with regard to the search term, you need to perform further data analysis by browsing through all the results and then manually locating the required sentiment information. Sentiment analysis provides a one-step process to identify sentiment information within a set of documents.
Sentiment analysis is the process of identifying and extracting sentiment metadata related to a specified topic or entity from a set of documents. The sentiment is identified using trained sentiment classifiers. When you run a query using sentiment analysis, in addition to the search results, sentiment metadata is also identified and displayed. Sentiment analysis provides answers to questions such as “Is a product review positive or negative?” or “Is the customer satisfied or dissatisfied?”. For example, from a document set consisting of multiple reviews for a particular product, you can determine an overall sentiment that indicates if the product is good or bad.
12.1.2 About Sentiment Classifiers
A sentiment classifier is a type of document classifier that is used to extract sentiment metadata related to a topic or document.
To perform sentiment analysis using a sentiment classifier, you must first associate a sentiment classifier preference with the sentiment classifier and then train the sentiment classifier.
User-defined sentiment classifiers can be associated with a sentiment classifier preference of type SENTIMENT_CLASSIFIER
. A sentiment classifier preference specifies the parameters that are used to train a sentiment classifier. These parameters are defined as attributes of the sentiment classifier preference. You can either create a sentiment classifier preference or use the default CTXSYS.DEFAULT_SENTIMENT_CLASSIFIER
. To create a user-defined sentiment classifier preference, use the CTX_DDL.CREATE_PREFERENCE
procedure to define a sentiment classifier preference and the CTX_DDL.SET_ATTRIBUTE
procedure to define its parameters.
To train a sentiment classifier, you need to provide an associated sentiment classifier preference, a training set of documents, and the sentiment categories. If no classifier preference is specified, then Oracle Text uses default values for the training parameters. The sentiment classifier is trained using the set of sample documents and specified preference. Each sample document is assigned to a particular category. Oracle Text deduces a set of classification rules that define how sentiment analysis must be performed using this sentiment classifier. Use the CTX_CLS.SA_TRAIN
procedure to train a sentiment classifier.
Typically, you would define and train separate sentiment classifiers for different categories of documents such as finance, product reviews, music, and so on. If you do not want to create your own sentiment classifier or if suitable training data is not available to train your classifier, you can use the default sentiment classifier provided by Oracle Text. The default sentiment classifier is unsupervised.
Note:
The default sentiment classifier works only with AUTO_LEXER
. Do not use AUTO_LEXER
when using user-defined sentiment classifiers.
12.1.3 About Performing Sentiment Analysis with Oracle Text
To perform sentiment analysis, you run a sentiment query that includes the sentiment classifier that must be used to identify sentiment information. The classifier can be the default sentiment classifier or a user-defined sentiment classifier.
Sentiment analysis can be performed only as part of a search operation. Oracle Text searches for the specified keywords and generates a result set. Then, sentiment analysis is performed on the result set to identify a sentiment score for each result. If you do not explicitly specify a sentiment classifier in your query, the default classifier is used.
You can either identify one single sentiment for the entire document or separate sentiments for each topic within a document. Most often, a document contains multiple topics and the author’s sentiment towards each topic may be different. In such cases, document-level sentiment scores may not be useful because they cannot identify sentiment scores associated with different topics in the document. Identifying topic-level sentiment scores provides the required answers. For example, when searching through a set of documents containing reviews for a camera, a document-level sentiment tells you whether the camera is good or not. Assume that you want the general opinion about the picture quality of the particular camera. Performing a topic-level sentiment analysis, with “picture quality” as one of the topics will provide the required information.
Note:
If you do not specify a topic of interest for sentiment analysis, then Oracle Text returns the overall sentiment for the entire document.
12.2 Creating a Sentiment Classifier Preference
Use the CTX_DDL.CREATE_PREFERENCE
procedure to create a sentiment classifier preference and the CTX_DDL.SET_ATTRIBUTE
procedure to define its attributes. The classifier type associated with a user-defined sentiment classifier preference is SENTIMENT_CLASSIFIER
.
To create a sentiment classifier preference:
- Define a sentiment classifier preference using the
CTX_DDL.CREATE_PREFERENCE
procedure. The classifier must be of typeSENTIMENT_CLASSIFIER
. - Define attributes for the sentiment classifier preference using the
CTX_DDL.SET_ATTRIBUTE
procedure. The attributes define the parameters that are used to train the sentiment classifier.
Example 12-1 Creating a Sentiment Classifier Preference
The following example creates a sentiment classifier preference named clsfier_camera
. This preference will be used to classify a set of documents that contain reviews for SLR cameras.
-
Define a sentiment classifier preference of type
SENTIMENT_CLASSIFIER
.The following command defines a sentiment classifier preference named
clsfier_camera
.exec ctx_ddl.create_preference('clsfier_camera','SENTIMENT_CLASSIFIER');
-
Define the attributes of the sentiment classifier preference
clsfier_camera
.The following commands define attributes for the
clsfier_camera
sentiment classifier preference. The maximum number of features to be extracted is set to 1000 and the number of iterations for which the classifier runs is set to 600.exec ctx_ddl.set_attribute('clsfier_camera','MAX_FEATURES','1000'); exec ctx_ddl.set_attribute('clsfier_camera','NUM_ITERATIONS','600');
For attributes that are not explicitly defined, the default values are used.
12.3 Training Sentiment Classifiers
Training a sentiment classifier generates the classification rules that will be used to provide a positive or negative sentiment with respect to a search keyword.
The following example trains a sentiment classifier that can perform sentiment analysis on user reviews of cameras:
12.4 Performing Sentiment Analysis Using the CTX_DOC Package
Use the procedures in the CTX_DOC
package to perform sentiment analysis on a single document within a document set. For each document, you can either determine a single sentiment score for entire document or individual sentiment scores for each topic within the document.
Before you perform sentiment analysis, you must create a context index on the document set. The following command creates a context index camera_revidx
on the document set contained in the camera_reviews
table.
create index camera_revidx on camera_reviews(review_text)
indextype is ctxsys.context
parameters ('lexer mylexer stoplist ctxsys.default_stoplist');
To perform sentiment analysis with the CTX_DOC
package, use one of the following methods:
Example 12-2 Obtaining a Single Sentiment Score for a Document
The following example uses the sentiment classifier clsfier_camera
to provide a single aggregate sentiment score for the entire document. The sentiment classifier has been created and trained. The table containing the document set has a context index called camera_revidx
. The doc_id
of the document within the document table for which sentiment analysis must be performed is 49. The topic for which a sentiment score is being generated is ‘Nikon’.
select ctx_doc.sentiment_aggregate('camera_revidx','49','Nikon','clsfier_camera') from dual;
CTX_DOC.SENTIMENT_AGGREGATE('CAMERA_REVIDX','49','NIKON','CLSFIER_CAMERA')
--------------------------------------------------------------------------
74
1 row selected.
Example 12-3 Obtaining a Single Sentiment Score Using the Default Classifier
The following example uses the default sentiment classifier to provide an aggregate sentiment score for the entire document. The table containing the document set has a context index called camera_revidx
. The doc_id
of the document, within the document table, for which sentiment analysis must be performed is 1.
select ctx_doc.sentiment_aggregate('camera_revidx','1') from dual;
CTX_DOC.SENTIMENT_AGGREGATE('CAMERA_REVIDX','1')
--------------------------------------------
2
1 row selected.
Example 12-4 Obtaining Sentiment Scores for Each Topic within a Document
The following example uses the sentiment classifier clsfier_camera
to generate sentiment scores for each segment within the document. The sentiment classifier has been created and trained. The table containing the document set has a context index called camera_revidx
. The doc_id
of the document within the document table for which sentiment analysis must be performed is 49. The topic for which a sentiment score is being generated is ‘Nikon’. The result table, restab
, that will be populated with the analysis results has been created with the columns snippet (CLOB
) and score (NUMBER
).
exec ctx_doc.sentiment('camera_revidx','49','Nikon','restab','clsfier_camera', starttag=>'<<', endtag=>'>>');
SQL> select * from restab;
SNIPPET
--------------------------------------------------------------------------------
SCORE
----------
It took <<Nikon>> a while to produce a superb compact 85mm lens, but this time they finally got it right.
65
Without a doubt, this is a fine portrait lens for photographing head-and-shoulder portraits (The only lens which is optically better is
<<Nikon>>'s legendary 10
5mm f2.5 Nikkor lens, and its close optical twin, the 105mm f2.8 Micro Nikkor.
75
Since the 105mm f2.5 Nikkor lens doesn't have an autofocus version, then this might be the perfect moderate telephoto lens for owners of
<<Nikon>> autofocus
SLR cameras.
84
3 rows selected.
Example 12-5 Obtaining a Sentiment Score for a Topic Within a Document
The following example uses the sentiment classifier tdrbrtsent03_cl
to generate a sentiment score for each segment within the document. The sentiment classifier has been created and trained. The table containing the document set has a context index called tdrbrtsent03_idx
. The doc_id
of the document within the document table for which sentiment analysis must be performed is 1. The topic for which a sentiment score is being generated is ‘movie’. The result table, tdrbrtsent03_rtab
, that will be populated with the analysis results has been created with the columns snippet and score.
SQL> exec ctx_doc.sentiment('tdrbrtsent03_idx','1','movie','tdrbrtsent03_rtab','tdrbrtsent03_cl');
PL/SQL procedure successfully completed.
SQL> select * from tdrbrtsent03_rtab;
SNIPPET
--------------------------------------------------------------------------------
SCORE
----------
the <b>movie</b> is a bit overlong , but nicholson is such good fun that the running time passes by pretty quickly
-62
1 row selected.
See Also:
-
CTX_DOC.SENTIMENT_AGGREGATE
in the Oracle Text Reference -
CTX_DOC.SENTIMENT
in the Oracle Text Reference
12.5 Performing Sentiment Analysis Using Result Set Interface
The XML Query Result Set Interface (RSI) enables you to perform sentiment analysis on a set of documents by using either the default sentiment classifier or a user-defined sentiment classifier. The documents on which sentiment analysis must be performed are stored in a document table.
The sentiment
element in the input RSI is used to indicate that sentiment analysis must be performed at query time in addition to other operations specified in the result set descriptor. If you specify a value for the classifier
attribute of the sentiment
element, then the specified sentiment classifier is used to perform the sentiment analysis. If the classifier
attribute is omitted, then Oracle Text performs sentiment analysis using the default sentiment classifier. The sentiment
element contains a child element called item
that specifies the topic or concept about which a sentiment must be generated during sentiment analysis.
You can either generate a single sentiment score for each document or separate sentiment scores for each topic within the document. Use the agg
attribute of the element item
to generate a single aggregated sentiment score for each document.
Sentiment classification can be performed using a keyword query or by using the ABOUT
operator. When you use the ABOUT
operator, the result set includes synonyms of the keyword that are identified using the thesaurus.
To perform sentiment analysis using RSI:
Example 12-6 Input Result Set Descriptor to Perform Sentiment Analysis
The following example performs sentiment analysis and generates a sentiment for the topic ‘lens’. The driving query is a keyword query for ‘camera’. The sentiment
element specifies that sentiment analysis must be performed using the sentiment classifier clsfier_camera
. This classifier has been previously created and trained using the CTX_CLS.SA_TRAIN_MODEL
procedure. camera_revidx
is a context index on the document set table.
The sentiment score ranges from -100 to 100. A positive score indicates positive sentiment whereas a negative score indicates a negative sentiment. The absolute value of the score is indicative of the magnitude of positive/negative sentiment.
To perform sentiment analysis and obtain a sentiment score for each topic within the document:
-
Create the result set table,
rs
, that will store the results of the search operation.SQL> var rs clob; SQL> exec dbms_lob.createtemporary(:rs, TRUE, DBMS_LOB.SESSION);
-
Perform sentiment analysis as part of a search query.
The keyword being searched for is ‘camera’. The topic for which sentiment analysis is performed is ‘lens’.
begin ctx_query.result_set('camera_revidx','camera',' <ctx_result_set_descriptor> <hitlist start_hit_num="1" end_hit_num="10" order="score desc"> <sentiment classifier="clsfier_camera"> <item topic="lens" /> <item topic="picture quality" agg="true" /> </sentiment> </hitlist> </ctx_result_set_descriptor>',:rs); end; /
-
View the results stored in the result table.
The XML result set can be used by other applications for further processing. Some of output has been removed for brevity. Notice that there is a score for each segment within the document that represents the sentiment score for the segment.
SQL> select xmltype(:rs) from dual; XMLTYPE(:RS) -------------------------------------------------------------------------------- <ctx_result_set> <hitlist> <hit> <sentiment> <item topic="lens"> <segment> <segment_text>The first time it was sent in was because the <b>lens </b> door failed to turn on the camera and it was almost to come off of its track . Eight months later, the flash quit working in all modes AND the door was failing AGAIN!</segment_text> <segment_score>-81</segment_score> </segment> </item> <item topic="picture quality"> <score> -75 </score> </item> </sentiment> </hit> <hit> <sentiment> <item topic="lens"> <segment> <segment_text>I was actually quite impressed with it. Powerful zoom , sharp <b>lens</b>, decent picture quality. I also played with some other Panasonic models in various stores just to get a better feel for them, as well as spent a few hours on </segment_text> <segment_score> 67 </segment_score> </segment> </item> <item topic="picture quality"> <score>-1</score> </item> </sentiment> </hit> . . . . . . </hitlist> </ctx_result_set>
See Also: