Tracking Web Opinions
Robust Extraction of Subjective Meaning

You are here: > Kamps > Internet

Jump ahead to:

A. The Measurement of Meaning Revisited

The classic work on measuring emotive or affective meaning in texts is Charles Osgood's Theory of Semantic Differentiation. Osgood and his collaborators identify the aspect of meaning in which they are interested as
a strictly psychological one: those cognitive states of human language users which are necessary antecedent conditions for selective encoding of lexical signs and necessary subsequent conditions in selective decoding of signs in messages. (Osgood et al. 1957, p.318).
Their semantic differential technique is using several pairs of bipolar adjectives to scale the responses of subjects to words, short phrases, or texts. That is, subjects are asked to rate their meaning on scales like active-passive; good-bad; optimistic-pessimistic; positive-negative; strong-weak; serious-humorous; or ugly-beautifully.

Each pair of bipolar adjectives is a factor in the semantic differential technique. As a result, the differential technique can cope with quite a large number of aspects of affective meaning. A natural question to ask is whether each of these factors is equally important. Osgood et.al. use factorial analysis of extensive empirical tests to investigate this question. The surprising answer is that most of the variance in judgment could be explained by only three major factors. These three factors of the affective or emotive meaning are the evaluative factor (e.g., good-bad); the potency factor (e.g., strong-weak); and the activity factor (e.g., active-passive). Among these three factors, the evaluative factor has the strongest relative weight.

B. Words with Attitude

We investigate measures for the evaluative factor of meaning based on the WordNet lexical database. WordNet is database of semantic knowledge inspired by psycho-linguistic and computational theories of human lexical memory (developed by the Princeton based group of George Miller). The evaluative dimension of Osgood is typically determined using the adjectives `good' and `bad' (other operationalizations are possible depending on the subject under investigation). In WordNet, we'll look at all the words that can be reached from the adjectives `good' and `bad,' this turns our to be roughly 25% of all the adjectives (i.e., 5410 adjective words, or 5464 adjective synsets).

WordNet neighborhood of adjective good

Depth is the maximal length of a synonymy path in WordNet, a higher familiarity value filters out uncommon words.
  1. adjective good depth 1, familiarity 1;
  2. adjective good depth 2, familiarity 1;
  3. adjective good depth 3, familiarity 1;
  4. adjective good depth 4, familiarity 1;
  5. adjective good depth 4, familiarity 3; and
  6. adjective good depth 4, familiarity 6.

WordNet neighborhood of adjective bad

  1. adjective bad depth 1, familiarity 1;
  2. adjective bad depth 2, familiarity 1;
  3. adjective bad depth 3, familiarity 1;
  4. adjective bad depth 4, familiarity 1;
  5. adjective bad depth 4, familiarity 3; and
  6. adjective bad depth 4, familiarity 6.
Perhaps surprising, adjectives `good' and `bad' are themselves closely related in WordNet. There exists a 5-long synonymy path good, sound, heavy, big, and bad. Although this is perhaps remarkable, it is not due to some error in the WordNet database (there exist several paths of length 5). Part of the explanation seem to be the wide applicability of these two adjectives (WordNet has 14 senses of bad and 25 senses of good). Think of Milgram's small world problem predicting mean distance of 6 between arbitrary people. Using this to our advantage, we now consider not only the shortest distance to `good' but also the shortest distance to the antonym `bad.'

WordNet geodesic distances to both `good' and `bad'

  1. adjectives good and bad depth 4, familiarity 2; and
  2. adjectives good and bad depth 4, familiarity 3.
  3. adjectives good and bad depth 4, familiarity 4.
Specifically, for each adjective, we calculate the evaluative dimension by its distance to `bad' minus its distance to `good,' divided by the distance between `good' and `bad.' This gives a value in [-1,1], negative for words on the `bad' side of WordNet, and positive for words on the `good' side of WordNet. In the same way, we can construct measure the potency dimension by considering distances from `strong' and `weak,' and for the activity dimension using `active' and `passive.'

C. Analysing Internet Discussion Sites

We can score a text on the evaluative, potency, and activity dimension of subjective meaning, by simply adding up the scores of the individual adjectives contained in it. We apply this to posting on Internet discussion sites (this data is update on a daily basis).

UK political parties

This is some data from Usenet activity in newsgroups about English politics, on the three main political parties in the UK, i.e., New Labour, Conservative Party, and the Liberal Democrats.

Raw data:

  1. Number of Posting per day (updated daily).
  2. Evaluative Factor per day (updated daily).
  3. Potency Factor per day (updated daily).
  4. Activity Factor per day (updated daily).
Although the adjectives in WordNet are well-balanced between the `good' and `bad' sides (the mean score over 5410 words is -0.0089), the data show a considerable bias towards `good.' This may reveal a tendency of people to elaborate positive views using many words, whereas tersely formulating negative views. After removing this `positivity' bias, and smoothing the data over 7 days, the last three months are:
  1. Number of Posting per day (updated daily).
  2. Evaluative Factor per day (updated daily).
  3. Potency Factor per day (updated daily).
  4. Activity Factor per day (updated daily).

Female Teenstars

This is some data from Usenet activity in newsgroups about music, on four of the currently popular teenstars, i.e., female singers Britney Spears, Christina Aguilera, Jessica Simpson, Mandy Moore.

Raw data:

  1. Number of Posting per day (updated daily).
  2. Evaluative Factor per day (updated daily).
  3. Potency Factor per day (updated daily).
  4. Activity Factor per day (updated daily).
After removing the `positivity' bias, and smoothing the data over 7 days, the last three months are:
  1. Number of Posting per day (updated daily).
  2. Evaluative Factor per day (updated daily).
  3. Potency Factor per day (updated daily).
  4. Activity Factor per day (updated daily).