T-LAB Home
T-LAB PLUS 2022 - ON-LINE HELP Prev Page Prev Page
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Texts and Discourses as Dynamic Systems
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Classification of New Documents
Key Contexts of Thematic Words
Export Custom Tables
Import-Export Identifiers list
Analysis Unit
Association Indexes
Cluster Analysis
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Elementary Context
Frequency Threshold
Graph Maker
Key-Word (Key-Term)
Lexical Unit
Lexie and Lexicalization
Markov Chain
Naïve Bayes
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Stop Word List
Test Value
Thematic Nucleus
Variables and Categories
Words and Lemmas

Corpus and Sub-sets

A corpus is a collection of one or more texts selected for analysis.

Each corpus subset is defined by means of a category of a variable.

T-LAB makes it possible to explore and to analyse the relationships between the analysis units of the whole corpus or its subsets.

Some corpus examples:

  • a single text or document concerning any subject;
  • a set of articles taken from the press, concerning the same subject;
  • one or more interviews carried out inside the same research project;
  • a set of answers to an open-ended question;
  • one or more focus group transcripts.

Some subset examples:

  • one or more chapters of a book;
  • one or more newspaper articles published in the same year;
  • one or more interviews with the same people category;
  • a subset of answers to an open-ended question.

N.B.: Further corpus subsets are the "thematic clusters" of documents or elementary contexts obtained by using the corresponding T-LAB tools.

In the case of a corpus made up of more than one text, in order to make it a set correctly analysable, it is required that all of its parts have two features that make them comparable:

a) a thematic and/or contextual homogeneity of their content;

b) a balanced relationship between their dimensions, both in terms of occurrences and in terms of Kbytes.

In T-LAB logic, the corpus is a database set up in records and fields. More precisely, records are made up of recorded entities (texts, text segments, words) and fields are made up of labels used to classify the different entities (text authors, reference contexts, word types, etc.).

See Corpus Preparation.