What T-LAB does and what it enables us to do
T-LAB software is an all-in-one set of linguistic and statistical tools for text analysis which can be used in research fields like Semantic Analysis, Content Analysis, Perceptual Mapping, Text Mining, Discourse Analysis.
In fact, being a text laboratory, T-LAB allows the integrated use of three kinds of tools for text analysis:

A - tools
for word co-occurrence analysis: computation of
word associations, comparisons between word pairs, co-word analysis and concept
mapping, sequence analysis, concordances;
B - tools for thematic analysis of the context
units: modeling of emerging themes, thematic analysis of elementary contexts
(i.e. chunks, sentences or paragraphs), sequences of themes, key contexts of
thematic words, thematic classification of documents, ;
C - tools for comparative analysis of two or more
corpus sub-sets: specificity analysis, correspondence analysis, multiple correspondence
analysis, cluster analysis.
The user interface is very user-friendly and various types of texts can be analysed:
- a single text (e.g. an interview, a book, etc.);
- a set of texts (e.g. a set of interviews, web pages, newspaper articles, responses
to open-ended questions, etc.).
All texts can be encoded with categorical variables and/or with IDnumbers that correspond to context units or cases (e.g. responses to open-ended questions).
Each corpus (one or more texts) must be in plain text (.txt) and can't exceed 30 Mb (about 18,000 pages in ASCII format).
Six steps are that is required to perform a quick verification of the software functionalities:
1 - Select the language of the interface and that of the corpus to be analysed

2 - Select any corpus to analyse

3 - Click "GO" in the first Setup window
![]() |
During
the pre-processing phase,
T-LAB
carries out the following treatments:
|
4 - Select a tool from one of the "Analysis" sub-menus

5 - Verify the results


6 - Use the contextual help function to interpret the various graphs and tables

The following information is provided to help the user to better understand
whatT-LAB
does and how to make full use of it.
From an external point of view, the use of the software is organized from the interface, that is from the main menu, from the sub-menus and from the options that they consist of.
Apart from the user interface, the T-LAB system is organized into two main components:
To understand how T-LAB works and how it can be used, it is essential to have a clear idea as to which analysis units are filed in its database and what statistical algorithms are used in the various analyses. In fact, the analysed data tables always consist of rows and columns the headings of which correspond to the analysis units filed in the database, while the algorithms regulate the processes that make it possible to detect significant relationships between the data and to extract useful information.
The analysis units used in T-LAB are of two types: lexical units and context units.
A - the lexical units are words and multi-words, filed and classified on the basis of a criterion. More precisely, in the T-LAB database each lexical unit consists of a classified record with two fields: word and lemma. In the first field ("word"), the words are listed as they appear in the corpus, while in the second ("lemma") the labels attributed to lexical units groups are listed and classified according to linguistic criteria (e.g. lemmatization) or by dictionaries and semantic grids defined by the user.
B - the context units are portions of text that the corpus can be divided into. More precisely, according to T-LAB logic, there can be three types of context units:
B.1 primary
documents, which correspond to the "natural" subdivision of
the corpus (e.g. interviews, articles, answers to open-ended questions, etc.),
that is the initial context defined by the user;
B.2 elementary contexts, which correspond to syntagmatic
units (i.e. chunks, sentences, paragraphs) in which each primary document can
be subdivided;
B.3 corpus subsets, which correspond to groups
of primary documents which lead to the same category (eg. interviews with "men"
or "women", articles in a specific year or a particular magazine and
so on) including thematic clusters of documents
or elementary contexts obtained by using the corresponding T-LAB
tools (see below the section 5 C).
Starting from this database organization, T-LAB
makes it possible - in automatic mode - to explore and to analyse the relationships
between the analysis units of the whole corpus
or its subsets.
In T-LAB, the selection of any analysis tool (click of the mouse) always activates a semi-automatic process that, with a few simple operations, generates an input table, it applies some statistical algorithms and produces some outputs.
Let's
consider how a typical work project which uses
T-LAB can be managed.
Hypothetically, each project consists of a set of analytical activities (operations)
which have the same corpus as their subject and are
organized according to the user's strategy and
plan. It then begins gathering
the texts to be analysed, and concludes with a report.
The succession of the various phases is illustrated in the following diagram:

N.B.
- The six numbered phases, from the corpus preparation to the interpretation
of the outputs, are supported by T-LAB tools
and are always reversible;
- By using T-LAB
automatic settings it is possible to avoid two phases
(3 and 4); however, in order to achieve high quality results, their use is,
nevertheless, advisable.
1 - CORPUS
PREPARATION:
transformation of the texts to be analysed in a file (corpus)
that can be processed by the software.
Each corpus which is to be analysed, in order to be imported into T-LAB, must be in the ASCII/ANSI format with the ".txt" extension.
In the case of a single text (or a corpus considered as a single text) T-LAB needs no further work.
Otherwise, if there are coding marks referring to some
variables in the corpus, in the preparation phase
some criteria must be observed (see the Corpus Preparation
section).
At the end of the corpus preparation phase it is recommended that a new folder
be created which contains only the corpus to be imported.
2 - CORPUS IMPORTATION:
a series of automatic processes that transform
the corpus into a set of tables integrated in the T-LAB
database.
Starting from the selection of the New Corpus option, the intervention of the user (advanced options) is required in order to to define certain choices (see below):

N.B.:
- The language selection (obligatory) define the lemmatization to be applied.
Currently automatic lemmatization is available in five languages: Italian, French,
English, Spanish and Portuguese. In any case, without automatic lemmatization
and/or using customized dictionaries, texts in all the languages (or dialects)
that support ASCII characters can be analysed (see above the "other"
option);
- Inexperienced users are advised to accept the preselected options;
- As
the pre-processing options determine both the kind and the number of analysis
units (i.e. context units and lexical units), different choices determine different
analysis results. For this reason, all T-LAB
outputs (i.e. charts and tables) shown in the user's manual and in the on-line
help are just indicative.
3 - THE
USE OF LEXICAL TOOLS allows
us to verify the correct recognition of the lexical
units and to customize their classification, that
is to verify and to modify the automatic choices made by T-LAB.
The procedures of the various interventions are illustrated in the corresponding help items (and in the manual).
In particular the user is requested to refer to the corresponding help item (and to the manual) for a detailed description of the Dictionary Building process (see below).

4 - THE KEY-WORD SELECTION consists of the arrangement of one or more lists of lexical units (words, lemmas or categories) to be used for producing the data tables to be analysed.
The automatic settings option provides the lists of the key-words selected by T-LAB; nevertheless, since the choice of the analysis units is extremely relevant in relation to subsequent elaborations, the use of customized settings (see below) is highly recommended. In this way the user can choose to modify the list suggested by T-LAB and/or to arrange lists that better correspond to the objectives of his research.

In any case, while creating these lists, the user can refer to the following criteria:
- check
the quantitative (total of the occurrences) and qualitative importance
of the various items;
- check the limitations of the analytical
tools that you intend to use (see at the end of this chapter);
- check whether the set of items is compatible with your own research strategies
(see item : 5 to follow).
5
- THE USE OF ANALYSIS TOOLS allow the user
to obtain outputs (tables and graphs) that represent significant
relationships between the analysis units and enables the user to make
inferences.
At
the moment (7.1 version), T-LAB includes
fifteen different analysis tools each of them having its own specific logic;
that is, each one generates specific tables, uses specific algorithms and produces
specific outputs.
Consequently, depending on the structure of texts to be analysed and on the
goals to be achieved, each time the user has to decide which tools are more
appropriate for his analysis strategy.
For this purpose, besides the distinction between tools for co-occurrence,
comparative and thematic
analysis (see below), it can be useful to consider that some of the latter allow
us to obtain new units corpus subsets which can be included in further analysis
steps.
In particular,
the Modeling of Emerging Themes,
Thematic Analysis of Elementary Contexts and Thematic
Document Classification tools allow us to find clusters of context units
characterized by similarity in meaning. These clusters, as categories obtained
by a content analysis, can work in co-occurrence or in comparative analysis
of corpus subsets.

Even though the various T-LAB tools can
be used in any order, there are nevertheless three ideal starting points in
the system which correspond to the three ANALYSIS sub-menus:
A :
TOOLS FOR CO-OCCURRENCE ANALYSE
These tools enable us to analyse different kinds of relationships between lexical units (i.e. words).

According to the types of relationships to be analysed, the T-LAB
options indicated in this diagram use one or more of the followings statistical
tools: Association Indexes, Chi
Square Tests, Cluster Analysis, Multidimensional
Scaling and Markov chains.
Here are some output examples:

- Comparison between Word Pairs

- Co-Word Analysis and Concept Mapping


B : TOOLS FOR COMPARATIVE ANALYSIS
These tools enable us to analyse different kinds of relationships between context units.

Specificity Analysis enables us to check which words are typical or exclusive of a specific corpus subset, either comparing it with the rest of the corpus or with another subset.


Correspondence Analysis allows us to explore similarities and differences between (and within) groups of context units.


Cluster Analysis , which requires a previous Correspondence Analysis, can be carried out using various techniques.


C : TOOLS FOR THEMATIC ANALYSIS
In either of the above cases, "themes" are clusters of context units characterized by the same patterns of key-words.
These tools
enable us to discover, examine and map "themes" emerging from texts.
As theme is a polysemous word, when using software
tools for thematic analysis we have to refer to operational definitions. More
precisely, in these T-LAB tools, "theme"
is a label used to indicate three different entities:
1- a specific ("thematic") key
term used for extracting a set of elementary contexts in which it is
associated with a specific group of words pre-selected by the user (see the
Key Contexts of Thematic Words tool);
2- a "thematic" cluster of contexts units
characterized by the same patterns of key-words (see the Thematic
Analysis of Elementary Contexts and Thematic Document
Classification tools);
3 - a mixture component of a probabilistic model
which represents each context unit (i.e. elementary context or document) as
generated from a fixed number of topics or "themes" (see the
Modeling of Emerging Themes tool).

In detail:
- through the Key Contexts of Thematic Words tool (see below), which uses the cosine coefficient as similarity measure, we can extract lists of meaningful elementary contexts which allow us to deepen the thematic value of specific key terms.


- through the Modeling of Emerging Themes tool (see below), which uses a Bayesian method, the mixture components - described through their characteristic vocabulary - can be used as categories in qualitative analyses or for the automatic classification of the context units (i.e. documents or elementary contexts).


- both the Thematic Analysis of Elementary Contexts and the Thematic Document Classification tools work in the following way:
a -
perform co-occurrence analysis to identify thematic
clusters;
b - perform comparative analysis of the profiles
of the various clusters;
c - generate various types of graphs and tables (see below);
d - allow you to file the new variables (thematic clusters) for further analysis.



6 - INTERPRETATION OF THE OUTPUTS consists in the consultation of the tables and the graphs produced by T-LAB, in the eventual customization of their format and in making inferences on the meaning of the relationships represented by the same.
In the case of tables, according to each case, T-LAB allows the user to export them in files with the following extensions: .DAT, .TXT, .XLS, .HTML. This means that, by using any text editor program and /or any Microsoft Office application, the user can easily import and re-elaborate them.
All graphs and charts can be zoomed, maximized, customized and exported in different formats (right click to show popup menu).



Some general criteria for the interpretation of the T-LAB outputs are illustrated in a paper quoted in the Bibliography (Lancia F.: 2005) and are available from the www.tlab.it website. This document presents the hypothesis that the statistical elaboration outputs (tables and graphs) are particular types of texts, that is they are multi-semiotic objects characterized by the fact that the relationships between the signs and the symbols are ordered by measures that refer to specific codes.
In other words, both in the case of texts written in "natural language" and those written in the "statistical language", the possibility of making inferences on the relationships that organize the content forms is guaranteed by the fact that the relationships between the expression forms are not random; in fact, in the first case (natural language) the significant units follow on and are ordered in a linear manner (one after the other in the chain of the discourse), while in the second case (tables and graphs) the organization of the multidimensional semantic spaces comes from statistical measures.
Even if the semantic spaces represented in the T-LAB maps are extremely varied, and each of them require specific interpretative procedures, we can theorize that - in general - the logic of the inferential process is the following:
A
- to detect some significant relationships between the units "present"
on the expression plan (e.g. between table and/or graph labels);
B - to explore and compare the semantic
traits of the same units and the contexts to which they are mentally and culturally
associated (content plan);
C - to generate some hypothesis or some
analysis categories that, in the context defined by the corpus, give reason
for the relationships between expression and content forms.
