www.tlab.it

Corpus Preparation


Each corpus which is to be analysed, in order to be imported into T-LAB, must be in the ASCII/ANSI format with the ".txt" extension.

In the case of a single text (or a corpus considered as a single text) T-LAB needs no further work.

Whereas, when the corpus consists of several texts and there are coding marks referring to some variables in the corpus, in the preparation phase two kinds of criteria must be observed:

a) structural criteria;

b) formal criteria.

N.B.:

- both the Gather your texts option and the MS Excel macro included in the T-LAB installation package (see the “…\My Documents\T-LAB” folder) allow a rapid and automatic transformation of the texts to be analysed into a corpus which is ready to be imported (see below).

- we advise an orthographic review of the material to be analysed. Moreover, if some important acronyms are spaced out from punctuation (e.g. "U.N.") their transformation in single string (e.g. "U_N") is recommended; this is because, in the normalization phase, T-LAB interprets the punctuation marks like separators;

- at the end of the corpus preparation phase it is recommended that a new folder be created which should contain only the corpus to be imported.