New Corpus
The New corpus option starts the importation process, through which T-LAB transforms the text file prepared by the user into a set of tables integrated with the session database.
The main phases of this process are the following:
· Corpus Normalization;
· Multi-Word and Stop-Word detection;
· Elementary Context Segmentation;
· Automatic Lemmatization;
· Vocabulary building;
· Key-Term selection.
To start
the process, it is first necessary to select the file to be imported (see the
following image):

Subsequently a setup form appears (see below) in which the user can make his choices.
N.B.:
- As
the pre-processing options determine both the kind and the number of analysis
units (i.e. context units and lexical units), different choices (see below the
advanced options) determine different analysis results. For this reason, all
T-LAB
outputs (i.e. charts and tables) shown in the users manual and in the
on-line help are indicative only.


1
- AUTOMATIC LEMMATIZATION
The automatic lemmatization is enabled only for the
language matching the user's interface; then, when the user intends to analyse
texts in different languages, the "other" option has to be selected.
The result of the lemmatization process can be verified by means of the Vocabulary
function and can be modified by means of the Dictionary
Building function.
2 - TEXT
SEGMENTATION (ELEMENTARY CONTEXTS)
According to the user's choices, the elementary contexts for the computation
of co-occurrences can be four: sentences, chunks of
comparable length, paragraphs or short texts (e.g. responses to open-ended questions).
The corpus_segments.dat file allows the user to verify the result of corpus
segmentation.
3 - MULTI-WORD CHECK
The "Basic" option activates the automatic use of T-LAB
multi-word
list.
Whereas
the "Advanced" option, enabled with automatic lemmatization only,
allows the user:
- to verify and modify the list of multi-words not included in the T-LAB
database;
- to import and use customized lists (Multiwords.txt
files).

4 - STOP-WORD CHECK
The "Basic" option activates the automatic use of T-LAB
stop-word list.
Differently
the "Advanced" option allows the user:
- to verify and modify the list of stop-words within the corpus;
- to import and use customized lists (StopWords.txt
files).

5
- KEY-TERM SELECTION
Available options allow us to choose the selection method (TF-IDF
or Chi-Square) and the maximum number of lexical
unitst to be included in a list used by T-LAB
for analysing texts with automatic settings.