Words and Lemmas
Any text analysis software first of all identifies the so called raw forms, that is the strings of letters separated by blank spaces. Then, according either to their specific algorithms or to the categories used by the specialists, the software recognizes lexemes, key-words, etc.
T-LAB tables, for all the lexical units present in the corpus database, provide two types of information:
· the first one, named "word", contains the transcript of the lexical units (single words or multi-words) as "strings" which are recognized by the software;
· the second, named "lemma", contains the labels (or tags) used for grouping and classifying the lexical units.
According to the case, a lemma can be:
- the result
of the automatic lemmatization process;
- an item of a "customized dictionary";
- a category grouping synonyms;
- a content analysis category;
- etc.