Sequence Analysis
This T-LAB tool allows a Markovian analysis of two kinds of sequences:
A) those concerning the lexical units (words, lemmas or categories) in the network defined by the corpus or by its subsets (see the CORPUS button in the following image);
B) those recorded in an external file made by the user (see the FILE button and the explanation at the end of this section).

In the case of (A), sequences are syntagmatic relationships between the lexical units under analysis, each of them - for each occurrence within the corpus chain - has a predecessor and a successor, that are respectively the lexical unit that comes before it and the lexical unit that comes after it.
Beginning
from a matrix in which all the predecessors and all the successors of each lexical
unit are recorded , T-LAB
calculates the transition probabilities (markov chains) between the lexical
units analysed (max 1,500).
The outputs available - all clickable - are graphs and tables.
In the graphs, the lessical units that are closer to the selected one are the lessical units that have the higher probability of coming before (predecessors) and after (successors).


Two tables show the sorted list of predecessors (the first) and successors (the second) of each selected lexical unit.
The list is in descending order according to the probability values ("PROB"). For example, in the following table, the probability that "cost" will follow "healt_care" is equal to 0.105, that is 10.5%.

The option triads allows us to visualize some tables with sequences of three elements in which, according to the choice of the user, the selected word is in the first, in the second or in the third position. For each triad T-LAB shows the corresponding occurrence values.
N.B.
Within the triads the empty words are not included.

According
to the graph theory, the predecessors and the successors of each node (in this
case, lexical unit) can be represented by means of arrows (arcs) coming to (in-degree
= types of predecessors) or going out (out-degree = types of successors).

As
an example, in the following table "people" has 167 types of successors
and 187 types of predecessors.
According to their ratio (successors/predecessors), it is possible to verify
the semantic variety engendered by each node in point:
- if the ratio is greater than 1, the node is defined "source";
- if the ratio is equal to 1, the node is defined "relay";
- if the ratio is lower than 1, the node is defined "well".
In
the same table, for each lexical unit, the column "cover" (coverage)
indicates the percentage of its occurrences preceded or followed by lexical
units included in the user list.

When the analysed units cover the totality of those present within the corpus
(e.g. use of categories for content analysis and/or use of external files),
the cover value is equal to 1; otherwise, it is a lower value.
Moreover: when the cover value is equal to 1, the summations of the probability
values (both of predecessors and of successors) are also equal to 1; otherwise,
they have lower values.
In both cases, the residual percentage is determined by the fact that there
are predecessors and successors not included in the analysis.
For
example, the sequence
represented in the following image is constituted by 39 events: of these, only
16 (the hypothetical units in analysis) are "covered" (gray boxes).
That is because some of them, e.g. those corresponding to the occurrences of
the lexical unit "A", have predecessors
and successors
not included in the analysis (white boxes).

Differently, when the user analyses an external file all the events are covered.
N.B. In order to analyse an external file, the user must place a Sequence.dat file into the work folder; then, after opening an existing project, he must select Sequence Analysis ("user" option).
The calculation method, the graphs and the tables are analogous to those already described (see above).
The Sequence.dat file, which can contain numerous kinds of tags (e.g. names of speakers in a conversation, categories obtained by content analysis, kinds of events, etc.), must be made up by "N" lines (min 50 max 10,000), each with a tag of a max of 50 characters, without punctuation marks or blank spaces.
Tag
types must be max 250.
Here are some lines of Sequence.dat files in the correct format:
|
Hamlet |
|
event_01 |
Both in
the case of sequences concerning the corpus lexical units and of those included
in an external file (Sequence.dat), T-LAB
produces four tables in the MY-OUTPUT folder:
- T_Successors.xls, with the transition probabilities
of the successors;
- T_Predecessors.xls with the transition probabilities
of the predecessors;
- Frequency_Average_Order.xls, only
provided when the corpus consists of short texts like responses to open-ended
questions, with the frequency and the average order of appearance
(or evocation) of each term;
- Adjacency_Matrix.xls, only
provided when the list of lexical units includes up to 250 items, which can
be used to generate other measures and graphs typical of the Network
Analysis.

Moreover T-LAB allows us to export GraphML files which can be edited by yEd software (see below).
