www.tlab.it

Elementary Contexts


During the importation phase, T-LAB makes a corpus segmentation into elementary contexts in order to help user exploration and, above all, to make analyses that require the co-occurrences computation.


According to the user's choices, the elementary contexts can be:

1 - Sentences

Elementary contexts ending with punctuation marks (.? ! ), whose length range is 50-1,000 characters.

 

2 - Chunks

Elementary contexts of comparable length made up of one or more sentences.

More precisely:

- T-LAB considers an elementary context to be every sequence of words interrupted by full stop and carriage return, whose dimensions are inferior to 400 characters;

- in the case where, within the maximum length, a full stop is not present, it searches for other punctuation marks in the following order (? ! ; : ,). If none are found, it performs segmentation on the basis of a statistical criterion, but without cutting the lexical units.



3 - Paragraphs

Elementary contexts ending with punctuation marks (.? ! ) and the return key, whose maximum length is 2,000 characters.

4 - Short Texts

This option is enabled only when the maximum length of texts is 2,000 characters (e.g. responses to open-ended questions).

N.B.:

- the corpus_segments.dat file contains the result of corpus segmentation;

- In T-LAB, the Concordances option allows the checking of elementary contexts where each word (or lemma) is present.