www.tlab.it

Analysis Unit


The analysis units used in T-LAB are of two types: lexical units and context units.

A. - the lexical units are words and multi-words, filed and classified on the basis of a criterion. More precisely, in the T-LAB database each lexical unit consists in a classified record with two fields: word and lemma. In the first field ("word"), the words are listed as they appear in the corpus, while in the second ("lemma") the labels attributed to lexical unit groups are listed and classified according to linguistic criteria (eg. Lemmatization) or by dictionaries and semantic grids defined by the user.


B. - the context units are portions of text that the corpus can be divided into. More precisely, according to T-LAB logic, there can be three types of context units:

B.1 primary documents, which correspond to the "natural" subdivision of the corpus (eg. interviews, articles, answers to open-ended questions, etc.), that is the initial context defined by the user;
B.2 elementary contexts, which correspond to syntagmatic units (i.e. chunks, sentences, paragraphs) in which each primary document can be subdivided;
B.3 corpus subsets, which correspond to groups of primary documents which lead to the same "category" (eg. interviews with "men" or "women", articles in a specific year or a particular magazine and so on).