www.tlab.it

Specificity Analysis


This T-LAB tool enables us to check which lexical units (words, lemmas or categories) are typical or exclusive in a text or a corpus subset defined by some categorical variables. So this option is enabled only when the corpus is made up of at least two texts or two subsets properly codified (see Corpus Preparation).

Specificity Analysis allows us to carry out two types of comparisons concerning rows and columns of contingency tables:

1- between a part (e.g. the subset "A") and the whole (e.g. the corpus under analysis, "B");

2- between pairs of subsets ("A" and "B").

 

In either instance Specificities involving both the intersection (tipical words) and the differences (exclusive words) can be analysed.

The computation modalities are shown in the corresponding glossary entry.

The considered lexical units can be all (automatic settings) or only those selected by the user (customized settings).

The four types of possible comparisons are as follows:

1.1 - part/whole: "typical" lexical units

Table reading keys are as follows:
- LEMMA = specific lexical units (over/under used);
- SUB = occurrences of each LEMMA in the subset;
- TOT = occurrences of each LEMMA in the corpus or in the two compared subsets (see 2.1 below);
- CHI2 = CHI square value (or VTEST = Test Value);
- (p) = probability associated with the chi square value (def=1).

By clicking on the listed items it is possible to create various charts (see below).

1.2 - part/whole: "exclusive" lexical units

2.1 - subset/subset: "typical" lexical units

2.2- subset/subset: "exclusive" lexical units


All contingency tables can be easily exported and allow us to create various charts.
Moreover, by clicking on specific cells of the table (see below), it is possible to create a HTML file including all elementary contexts where the word in row is present in the corresponding subset.