Thematic Analysis of Elementary Contexts
This T-LAB tool allows you to obtain and explore a representation of corpus contents through few and significant thematic clusters (from 3 to 50), each of which:
a) consists
of a set of elementary contexts (i.e. sentences,
paragraphs or short texts like responses to open-ended questions) characterized
by the same patterns of key-words;
b) is described through the lexical
units (words, lemmas or categories) and the variables
(if present) most characteristic of the context units from which it is composed.
In many ways, analysis results can be considered as an isotopy (iso = same; topoi = places) map where each of them, as generic or specific theme (Rastier, 2002: 204), is characterized by the co-occurrences of semantic traits.

A
T-LAB
dialog box (see above) allows the user to set some analysis parameters.
In particular:
- the (A) parameter allows the user to fix the maximum number of cluster partitions
to be included in T-LAB
outputs. Nonetheless, the clustering algorithm used stops when any
further partition doesn't match statistical criteria;
- the (B) parameter allows the user to exclude from the analysis any context
unit that doesn't contain a minimum number of key-words included in the list
which is being used.
N.B.: Both the above parameters produce significant
changes in the analysis results only when the number of context units is very
large and/or when they are short texts.
The analysis procedure consists of the following steps:
a - construction
of a data table context units x lexical units (up to 150,000 rows x 1,500 columns),
with presence/absence values;
b - TF-IDF normalization and scaling of row vectors
to unit length (Euclidean norm);
c - clustering of the context units (measure: cosine coefficient; method: bisecting
K-means);
d - filing of the obtained partitions and, for each of them:
e- construction of a contingency table lexical units x clusters (n x k);
f- chi square test applied to all the intersections of the contingency table;
g- correspondence analysis of the contingency table lexical units x clusters.
This procedure therefore performs a type of co-occurrence analysis (steps a-b-c) and, subsequently, a type of comparative analysis (steps e-f-g). In particular, comparative analysis uses the categories of the "new variable" derived from the co-occurrence analysis (categories of the new variable = thematic clusters) to form the contingency table columns.
N.B.: When
the user decides to repeat/apply the results of a previous analysis (i.e. a
Thematic Analysis of Elementary Contexts or a Modeling
of Emerging Themes), T-LAB
performs a comparative analysis only (steps e-f-g).
On completion of the analysis you can easily perform the following operations:
1 - explore the characteristics of the clusters;
2 - explore the relationships between the clusters;
3 - explore the relationships between clusters and variables;
4 - explore the various cluster partitions (from 3 to 50);
5 - refine the results of the chosen partition and, if necessary, repeat the
above steps (1,2,3);
6 - assign labels to the clusters;
7 - verify which elementary contexts belong to each cluster;
8 - verify the score of each elementary context within the cluster to which
it belongs;
9
- export a thematic document classification (only provided when the corpus is
made up of at least 2 primary documents and when they are not short texts like
responses to open ended questions);
10
- save the selected partition for exploration with other T-LAB
tools.
In details:
1 - Explore
the characteristics of the clusters
Clicking
on the CHARACTERISTICS button shows the lexical
units
and the variable values which characterize each cluster: Chi-square values and
the sums of the elementary contexts in which it is found, both in the selected
cluster ("IN CLUST") and in the analysed total ("IN TOT").
The "CAT" column also indicates whether the characteristic has been
selected by the user ("A") with the Customized
Settings function or has been suggested by T-LAB
as a "supplementary" description ("S").
In the case of the chi square test the structure of the analysed table is the following:

Where:
nij refers to occurrences of word (a) within the selected cluster (A)
Nj refers to all occurrences of word (a) within the corpus (or the corpus
sub-set) analysed
Ni refers to all word occurrences within the selected cluster (A)
N refers to all word occurrences of the contingency table word by cluster.
An HTML report (see below) is generated to permit detailed analysis of the cluster characteristics. In the report, in addition to the list of typical words, the most characteristic elementary contexts of the selected cluster are shown in descending order according to their respective score.

Pie charts and bar charts are used to verify the percentage of context units (i.e. elementary contexts) that belong to each cluster.


2 - Explore the relationships between the clusters
Some
of the graphs obtained by Correspondence
Analysis enable you to explore the relationships between clusters
in bidimensional spaces.
More specifically:
- You can explore the various combinations of factorial axes, simply by selecting
them in the appropriate boxes ("X axis", " Y axis");
- For each of the combinations (X-Y), you can display various types of elements
(clusters, lemmas and variables).

All the graphs can be maximized and customized by using the appropriate dialog box (just right click on the chart). Moreover, when thematic clusters are 4 or more, their relationships can be explored through 3d moving (see below).



Moreover,
for every factorial axis, T-LAB
supplies tables that facilitate the interpretation.
These are shown after every selection in the appropriate boxes (see below).

By selecting the Complete Results option it is possible to check all the results of the Correspondence Analysis lexical units x clusters.

A
specific option (see below) allows us to visualise/export the contingency table
and to create charts showing the distribution of each word within the clusters
and their corresponding chi-square value.
Moreover, by clicking on specific cells of the table, it is possible to create
a HTML file including all elementary contexts where the word in row is present
in the corresponding cluster.


3 - Explore the relationships between clusters and variables
Bar charts allow you to verify the relationships between clusters and variables.

You can explore additional relationships between clusters and variables using the functions provided in the Factor Analysis section (see above).
4 - Explore the various cluster partitions
Because the algorithm used (bisecting K-means) produces a hierarchical clustering, the user can explore various analysis solutions: partitions from 3 to 50 clusters.
For each partition obtained, a specific table (see below) lists the following
values:
- "Index", obtained by dividing the between cluster variance by the
total variance;
- "Gap", corresponding to the difference between the index value and
the value of the immediately previous partition:
- the number of the "child" cluster obtained from the bisection of
the corresponding "parent".
The Partition option allows you to easily explore the characteristics of the available clustering solutions (just click on a table item).


The dendrogram function (see below) allows you to check the tree structure of the various bisections.

5 - Refine the results of the chosen partition
After having explored different solutions, the user can refine the results of the chosen partition and, if necessary, repeat some of the three operations above illustrated.
In particular,
this step allows the user to delete from the analysis all context units of which
cluster membership doesn't fit either of the following criteria:
a) the cluster memberships of the i-context unit, determined by the bisecting
K-means first (unsupervised clustering) and by a Naïve
Bayes Classifier later (supervised clustering), must be the same;
b) the maximum posterior value (see below) corresponding to the i-context unit
cluster membership must be, in percentage terms, at least 50% higher than its
remaining values (i.e. posterior value in other clusters).

All the
results of this computation are in the following table exported by T-LAB
(see below), where the posteriori values for each cluster are in percentage
format.

6 - Assign labels to the clusters
A specific
T-LAB
function allows you to assign labels to clusters.
(N.B: The software proposes a number of labels automatically the first time
you use this function.)

Labels assigned to clusters can be displayed in the various graphs available (see below).

7 - Verify
which elementary contexts belong to each cluster
8 -
Verify
the score of each elementary context within the cluster to which it belongs
9 -
Obtain a thematic document classification

In fact the Cluster Membership button lets you export three types of tables (see below) in MS Excel format:
a - "Cluster_Partitions.xls" listing all the context unit correspondence for each cluster within the various partitions;

b - "Themes-Contexts.xls" (see below) listing the context unit correspondences
for each cluster within the selected partition.

In particular, the relevance value (Score) assigned to each elementary context (j) belonging to the cluster (k) comes from the following formula:

Where:
Scorej = relevance value assigned to the elementary context (j);
SXij
= sum of the Chi-square values assigned to the key-words (i) found in the elementary
context in question (j) which are "typical" of the cluster (k);
nj = number of key-words (distinct words), typical of the cluster (k),
found in the elementary context (j);
N = number of key-words (distinct words) typical of the cluster (k).
c - "Ec_Document_Classification.xls" (only provided when the corpus is made up of at least 2 primary documents at least and when they are not short texts like responses to open ended questions) listing the mixed cluster membership of each document (see below).

In this case the values come from the above formula (see "b") by summing
the scores of elementary contexts belonging to each document and by applying
a percentage calculation.
10 - Save the selected partition for exploration with other T-LAB tools
When you exit the Thematic Analysis of Elementary Contexts function, the software displays messages to remind you that you can use other T-LAB tools to explore the clusters obtained.
![]() |
If you select Save, the < CONT_CLUST > variable (clusters of elementary contexts) remains available only for certain types of analysis (e.g. Sequences of Themes, Word Associations, Comparison between Pairs of Key-Words, Co-Word Analysis and Concept Mapping) and until the user modifies his word list.