www.tlab.it

Dictionary building


The option Dictionary building opens a window in which the user can carry out some operations on the corpus dictionary.

The user can rename or group the avalaible lemmas; furthermore he can export the dictionary or import a customized dictionary (also supplied by a third part).

The starting point is a table (the Corpus Dictionary) that reports the following information:

- word/lemma correspondences;
- word occurrences;
- some labels which refer to the automatic lemmatization (see the "INF" column)
.

Before any operation, by selecting (double click) specific words and using the Contexts button, it is possible to check their concordances (see below).

The possible operations, even though different in their goals (revision of the lemmatizations and/or applications of grids for content analysis), all give a reorganization of the T-LAB database, thus creating different tables used to analyse data. Therefore all operations must be done for the words (lemmas or categories) considered to be interesting for the subsequent analyses. T-LAB, in fact, makes a further option available, Customized Settings, with which users can decide which lemmas to "keep" and which to "discard".
The two functions (Dictionary Building and Customized Settings) are strongly interconnected and the user can easily move from one to the other, also in order to change one's choices.

In Dictionary building there are two operating modalities:

- "one-by-one", with direct changes (selecting and typing) in the column "Lemma";

- "by groups", with the possibility of moving selected words (double click) to the box on the right and, afterwards, re-denominating them by using the option "replace".

In the second case, the new label can be chosen from the selected lemmas (just click on an item in the box on the right) or by typing a new label in the appropriate box.

In order to import a customized dictionary, your file - Dictio.diz or Dictionary.diz - must be located in the folder of the corpus in analysis.
It can be made up of "n" lines, each with a couple of strings separated by the character ";".
The maximum length of a string (word, lemma or category) is 50 characters: neither blank spaces no apostrophes must be included.

For each couple, the first string - on the left - indicates the label (lemma or category) defined by the user, the second indicates the corresponding word (Dictio.diz case) or lemma (Dictionary.diz case) already present in T-LAB dictionary.

These are some examples:

(File Dictio.diz) (File Dictionary.diz)

ACCEPT;accept
ACCEPT;accepted
ACCEPT;accepting
ACCEPT;accepts

------
CHILD;child

CHILD;children
WOMAN;woman

WOMAN;women

BIOTECH;biotech
BIOTECH;biotechnology

---
ABSTRACT_TOUGHT;distinctness
ABSTRACT_TOUGHT;distinguish
ABSTRACT_TOUGHT;diversification
ABSTRACT_TOUGHT;diversif

According to the type of file you import, the changes will be as follows:

 

 

N.B.:
- The button with the the floppy disk icon allows us to save a file (Dictio.diz), ready-made to be reused as a T-LAB dictionary, even after it has been modified and set up by the user;

- Using the option Save your settings (see Customized Settings) the same corpus - without need of further importation - can be analysed with several dictionaries (up to a maximum of 10);

- Using the option Lemmatized Corpus it is possible to export a copy of the corpus ( .txt file) in which every word will be replaced by the corresponding lemma or category.
- W hen the dictionary has been modified, the following analyses on the same corpus are available only as customized settings.