Tracking Web Opinions
Robust Extraction of Subjective Meaning
You are here:
> Kamps
> Internet
Jump ahead to:
A. The Measurement of Meaning Revisited
The classic work on measuring emotive or affective meaning in texts is
Charles Osgood's Theory of Semantic Differentiation.
Osgood and his collaborators identify the aspect of meaning in which
they are interested as
a strictly psychological one: those cognitive states of human
language users which are necessary antecedent conditions for
selective encoding of lexical signs and necessary subsequent
conditions in selective decoding of signs in messages.
(Osgood et al. 1957, p.318).
Their semantic differential technique is using several pairs of
bipolar adjectives to scale the responses of subjects to words, short
phrases, or texts. That is, subjects are asked to rate their meaning
on scales like active-passive; good-bad; optimistic-pessimistic;
positive-negative; strong-weak; serious-humorous; or
ugly-beautifully.
Each pair of bipolar adjectives is a factor in the semantic
differential technique. As a result, the differential technique can
cope with quite a large number of aspects of affective meaning. A
natural question to ask is whether each of these factors is equally
important. Osgood et.al. use factorial analysis of extensive
empirical tests to investigate this question. The surprising answer
is that most of the variance in judgment could be explained by only
three major factors. These three factors of the affective or emotive
meaning are the evaluative factor (e.g., good-bad); the
potency factor (e.g., strong-weak); and the activity
factor (e.g., active-passive). Among these three factors, the
evaluative factor has the strongest relative weight.
B. Words with Attitude
We investigate measures for the evaluative factor of meaning based on
the WordNet lexical database.
WordNet is database of semantic knowledge
inspired by psycho-linguistic and computational theories of human
lexical memory (developed by the Princeton
based group of George Miller).
The evaluative dimension of Osgood is
typically determined using the adjectives `good' and `bad' (other
operationalizations are possible depending on the subject under
investigation). In WordNet, we'll look at all the words that can be
reached from the adjectives `good' and `bad,' this turns our to be
roughly 25% of all the adjectives (i.e., 5410 adjective words, or 5464
adjective synsets).
WordNet neighborhood of adjective good
Depth is the maximal length of a synonymy path in WordNet, a higher
familiarity value filters out uncommon words.
- adjective good depth 1, familiarity 1;
- adjective good depth 2, familiarity 1;
- adjective good depth 3, familiarity 1;
- adjective good depth 4, familiarity 1;
- adjective good depth 4,
familiarity 3; and
- adjective good depth 4, familiarity 6.
WordNet neighborhood of adjective bad
- adjective bad depth 1, familiarity 1;
- adjective bad depth 2, familiarity 1;
- adjective bad depth 3, familiarity 1;
- adjective bad depth 4, familiarity 1;
- adjective bad depth 4,
familiarity 3; and
- adjective bad depth 4, familiarity 6.
Perhaps surprising, adjectives `good' and `bad' are themselves closely
related in WordNet. There exists a 5-long synonymy path good,
sound, heavy, big, and bad. Although this
is perhaps remarkable, it is not due to some error in the WordNet
database (there exist several paths of length 5). Part of the
explanation seem to be the wide applicability of these two adjectives
(WordNet has 14 senses of bad and 25 senses of good). Think of
Milgram's small world problem predicting mean distance of 6 between
arbitrary people. Using this to our advantage, we now consider not
only the shortest distance to `good' but also the shortest distance to
the antonym `bad.'
WordNet geodesic distances to both `good' and `bad'
- adjectives good and bad
depth 4, familiarity 2; and
- adjectives good and bad
depth 4, familiarity 3.
- adjectives good and bad
depth 4, familiarity 4.
Specifically, for each adjective, we calculate the evaluative
dimension by its distance to `bad' minus its distance to `good,'
divided by the distance between `good' and `bad.' This gives a value
in [-1,1], negative for words on the `bad' side of WordNet, and
positive for words on the `good' side of WordNet. In the same way, we
can construct measure the potency dimension by considering
distances from `strong' and `weak,' and for the activity
dimension using `active' and `passive.'
C. Analysing Internet Discussion Sites
We can score a text on
the evaluative, potency, and activity dimension of subjective meaning,
by simply adding up the scores of the individual adjectives contained
in it. We apply this to posting on Internet discussion sites (this
data is update on a daily basis).
UK political parties
This is some data from Usenet activity in newsgroups about English
politics, on the three main political parties in the UK, i.e., New
Labour, Conservative Party, and the Liberal Democrats.
Raw data:
- Number of Posting per day
(updated daily).
- Evaluative Factor per day
(updated daily).
- Potency Factor per day
(updated daily).
- Activity Factor per day
(updated daily).
Although the adjectives in WordNet are well-balanced between the `good'
and `bad' sides (the mean score over 5410 words is -0.0089), the data show a
considerable bias towards `good.' This may reveal a tendency of people to
elaborate positive views using many words, whereas tersely formulating
negative views.
After removing this `positivity' bias, and smoothing the data over 7 days,
the last three months are:
- Number of Posting per day
(updated daily).
- Evaluative Factor per day
(updated daily).
- Potency Factor per day
(updated daily).
- Activity Factor per day
(updated daily).
Female Teenstars
This is some data from Usenet activity in newsgroups about music,
on four of the currently popular teenstars, i.e., female singers
Britney Spears, Christina Aguilera, Jessica Simpson, Mandy Moore.
Raw data:
- Number of Posting per day
(updated daily).
- Evaluative Factor per day
(updated daily).
- Potency Factor per day
(updated daily).
- Activity Factor per day
(updated daily).
After removing the `positivity' bias, and smoothing the data over 7 days,
the last three months are:
- Number of Posting per day
(updated daily).
- Evaluative Factor per day
(updated daily).
- Potency Factor per day
(updated daily).
- Activity Factor per day
(updated daily).