www.tlab.it

New Corpus


The New corpus option starts the importation process, through which T-LAB transforms the text file prepared by the user into a set of tables integrated with the session database.

The main phases of this process are the following:

· Corpus Normalization;

· Multi-Word and Stop-Word detection;

· Elementary Context Segmentation;

· Automatic Lemmatization;

· Vocabulary building;

· Key-Term selection.

To start the process, it is first necessary to select the file to be imported (see the following image):

 

Subsequently a setup form appears (see below) in which the user can make his choices.

N.B.:
- As the pre-processing options determine both the kind and the number of analysis units (i.e. context units and lexical units), different choices (see below the advanced options) determine different analysis results. For this reason, all
T-LAB outputs (i.e. charts and tables) shown in the user’s manual and in the on-line help are indicative only.

 

1 - AUTOMATIC LEMMATIZATION

The automatic lemmatization is enabled only for the language matching the user's interface; then, when the user intends to analyse texts in different languages, the "other" option has to be selected.
The result of the lemmatization process can be verified by means of the Vocabulary function and can be modified by means of the Dictionary Building function.

2 - TEXT SEGMENTATION (ELEMENTARY CONTEXTS)

According to the user's choices, the elementary contexts for the computation of co-occurrences can be four: sentences, chunks of comparable length, paragraphs or short texts (e.g. responses to open-ended questions).
The corpus_segments.dat file allows the user to verify the result of corpus segmentation.

3 - MULTI-WORD CHECK

The "Basic" option activates the automatic use of
T-LAB multi-word list.

Whereas the "Advanced" option, enabled with automatic lemmatization only, allows the user:
- to verify and modify the list of multi-words not included in the
T-LAB database;
- to import and use customized lists (Multiwords.txt files).

4 - STOP-WORD CHECK

The "Basic" option activates the automatic use of
T-LAB stop-word list.

Differently the "Advanced" option allows the user:
- to verify and modify the list of stop-words within the corpus;
- to import and use customized lists (StopWords.txt files).

5 - KEY-TERM SELECTION

Available options allow us to choose the selection method (TF-IDF or Chi-Square) and the maximum number of lexical unitst to be included in a list used by
T-LAB for analysing texts with automatic settings.