T-LAB Home
T-LAB 10.2 - ON-LINE HELP Prev Page Prev Page
T-LAB
Introduction
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
File
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Settings
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Concordances
Co-occurrence Toolkit
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Texts and Discourses as Dynamic Systems
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Classification of New Documents
Key Contexts of Thematic Words
Export Custom Tables
Editor
Import-Export Identifiers list
Glossary
Analysis Unit
Association Indexes
Chi-Square
Cluster Analysis
Coding
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Disambiguation
Dictionary
Elementary Context
Frequency Threshold
Graph Maker
Homograph
IDnumber
Isotopy
Key-Word (Key-Term)
Lemmatization
Lexical Unit
Lexie and Lexicalization
Markov Chain
MDS
Multiwords
N-grams
Naïve Bayes
Normalization
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Profile
Specificity
Stop Word List
Test Value
Thematic Nucleus
TF-IDF
Variables and Categories
Words and Lemmas
Bibliography
www.tlab.it

Specificity Analysis


N.B.: The pictures shown in this section have been obtained by using a previous version of T-LAB. These pictures look slightly different in T-LAB 10. In particular, starting from the 2021 version, a quick access gallery of pictures which works as an additional menu allows one to switch between various outputs with a single click. Moreover the user is enabled to easily evaluate similarities (i.e. Cosine) and differences (i.e. Inter-Textual Distance) between corpus subsets (from 2 to 150), and so also to detect duplicate and near-duplicate documents (see pictures below).

This T-LAB tool enables us to check which lexical units (words, lemmas or categories) are typical or exclusive in a text or a corpus subset defined by a categorical variable, as well as to check the 'typical contexts' of each analysed subset (e.g. the 'typical' sentences used by any specific political leader).

In detail:

The typical lexical units, defined for over-using or under-using, are detected by means of the chi-square or the test value computation.

The typical elementary contexts are detected by computing and summing the normalized TF-IDF values assigned to the words which each sentence or paragraph consists of.

Specificity Analysis allows us to carry out two types of comparisons:

1- between a part (e.g. the subset "A") and the whole (e.g. the corpus under analysis, "B");

2- between pairs of corpus subsets ("A" and "B").

 

In either instance Specificities involving both the intersection (tipical words) and the differences (exclusive words) can be analysed.

The computation modalities are shown in the corresponding glossary entry.

The considered lexical units can be all (automatic settings) or only those selected by the user (customized settings).

The four types of possible comparisons are as follows:

1.1 - part/whole: "typical" lexical units

Table reading keys are as follows:
- LEMMA = specific lexical units (over/under used);
- SUB = occurrences of each LEMMA in the subset;
- TOT = occurrences of each LEMMA in the corpus or in the two compared subsets (see 2.1 below);
- CHI2 = CHI square value (or VTEST = Test Value);
- (p) = probability associated with the chi square value (def=1).

By clicking on the listed items it is possible to create various charts (see below).

1.2 - part/whole: "exclusive" lexical units

2.1 - subset/subset: "typical" lexical units

2.2- subset/subset: "exclusive" lexical units

For each targeted subset it is also possible to check its 'typical' elementary contexts, the specificity of which is a result of the computation of normalized TF-IDF values. More specifically, the 'score' assigned to each elementary context (see the picture below) results from the sum of TF-IDF values assigned to the words which it consists of.

All contingency tables can be easily exported and allow us to create various charts.
Moreover, by clicking on specific cells of the table (see below), it is possible to create a HTML file including all elementary contexts where the word in row is present in the corresponding subset.

Eventually, by clicking the appropriate button (see below), a dictionary file with the .dictio extension is created which is ready to be imported by any T-LAB tool for thematic analysis. Such a dictionary includes all typical words of the selected categorical variable.