Corpus and Sub-sets
A corpus is a collection of one or more texts selected for analysis.
Each corpus subset is defined by means of a category of a variable.
T-LAB makes it possible to explore and to analyse the relationships between the analysis units of the whole corpus or its subsets.


Some corpus examples:
Some subset examples:
N.B.: Further corpus subsets are the "thematic clusters" of documents or elementary contexts obtained by using the corresponding T-LAB tools.
In order to be imported into T-LAB, the corpus must be made up of an ASCII/ANSI file with the .txt extension.
In the case of a corpus made up of more than one text, in order to make it a set correctly analysable, it is required that all of its parts have two features that make them comparable:
a) a thematic and/or contextual homogeneity of their content;
b) a balanced relationship between their dimensions, both in terms of occurrences and in terms of Kbytes.
In T-LAB logic, the corpus is a database set up in records and fields. More precisely, records are made up of recorded entities (texts, text segments, words) and fields are made up of labels used to classify the different entities (text authors, reference contexts, word types, etc.).
See Corpus Preparation.