Modeling of Emerging Themes
This
T-LAB
tool provides a simple way of discovering, examining and
modeling, the main themes or topics (henceforward 'theme' and 'topic'
will be used synonymously) emerging from texts.
Subsequently they can be explored further with several tools, either by keeping
separate or by combining qualitative and quantitative approaches.
In fact, themes - which are described through their characteristic vocabulary
and consist of co-occurrence patterns of key-terms
- can be used as categories in further analyses or for automatically classifying
the context units (i.e. documents or elementary contexts).
The only parameter (see below) that the user can set is the amount (i.e. a fixed
number) of themes to be obtained. Note that the higher this number is the more
consistent are the co-occurrence patterns; moreover, if necessary, some themes
(e.g. those that are redundant or difficult to interpret) can be discarded later.

The analysis
procedure consists of the following steps:
a - construction of a co-occurrence matrix (depending on the cases, either a
document by word or a elementary context by word matrix);
b - data analysis by a probabilistic model which uses the Latent Dirichlet Allocation
and the Gibbs Sampling (see the related information on Wikipedia: http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation;
http://en.wikipedia.org/wiki/Gibbs_sampling;
c - description of themes by means of the probability of their characteristic
words, either "specific" or "shared" by two or more themes.
On completion of the analysis you can easily perform the following operations:
1 - explore, rename and remove the characteristics of each theme;
2 - rename or discard specific themes;
3 - test the model by a Naïve Bayes Classifier which assigns context units (i.e. documents and/or elementary contexts) to themes;
4 - apply the model and visualize the relationships between the different themes.
In detail:
1 - Explore, rename and remove the characteristics
of each theme

In this chart (see above) "hight probability" indicates a probability >=.75.

By clicking on each theme label (see "A" above), tables and charts can be visualized (see "B" above); moreover, by clicking on words in the table (see "C" above), their distribution within the various themes is displayed and a "remove" option is available.
The reading keys of the table are as follows:
IN THEME = tokens of each word within the selected theme;
TOT = total tokens of each word within the corpus (or the subset) analysed;
IN (%) = percentage values of each word within the selected theme;
(p) = probability value of each word over themes;
TYPE = specific when the word belongs to the selected
theme only (i.e. p=1); shared in the other cases.
By selecting
the complete results option (see "B"
above) a HTML file is created including all themes and their characteristic
vocabulary; moreover, two XLS files can be saved.

When the "shared words" option is selected (see below) it is possible
to explore the corresponding table and create a chart for each item selected.

2 - Rename or discard specific themes
In order or discard specific themes, just select one of them (see "A"
below) and click on the "rename/remove"
button (see "B" below).
When the appropriate box appears, depending on your goals, you can change the
label by choosing among the available words or by typing a new label in the
appropriate field (see "C" below); otherwise you can discard the selected
theme just by clicking on the corresponding button (see "D" below)

3 - Test the Model
At the end of the analysis procedure (see above the "a" and "b"
points) each context unit (i.e. primary documents or elementary contexts) is
represented as mixture of different topics; differently the Naïve
Bayes Classifier used in this step assigns each context unit to the topic
which is the most characteristic of it.
For this reason, when the "Test the Model"
option is selected, T-LAB creates a HTML file and two XLS files including the
classification of contexts units (see below).




5 - Apply the model
After having applied and saved the model (see "A" below), the results
of analysis can be immediately visualised by a MDS map.

Moreover, given that after exiting from the analysis (see "B" above) themes are recorded as clusters of context units (i.e. like the Thematic Analysis of Elementary Contexts and Thematic Classification of Documents results), the new thematic variables just created (i.e. CONT_CLUST and/or DOC_CLUST) can be explored by using various T-LAB tools (see below).

For example, you can perform a Correspondence Analysis of themes (see below)
produce a network map (see below) by using the Sequence of Themes tool

obtain Word Associations map by using the corresponding T-LAB tool (see below) and so on.
