T-LAB 10.2 - ON-LINE HELP - T-LAB Tools for Text Analysis

In the case of a corpus made up of a single text, and when the user doesn't resort to variables, there are no further operations required: it is possible to continue with the importation phase.

When, on the other hand, the corpus is made up of various text documents and/or categorical variables are used, the corpus preparation must be done by means of the Corpus Builder tool (see above) which, automatically, respects the following criteria:

Each text or subset of it (the "parts" defined by variables and/or IDnumber) is preceded by a coding line.

Each coding line has this format:

- It begins with a four asterisks string (****) followed by a blank space. T-LAB reads this string as: "here begins a user-defined text or a context unit".

- It goes on with the addition of strings made up by single asterisks and labels that define cases (IDnumber), variables and respective categories.

- It ends with the return key.

Here are some examples.

The following line introduces a text (or a corpus subset) codified with three variables - AGE, SEX and OCC (occupation) - and their categories (ADUL, FEM, PROF).

**** *AGE_ADUL *SEX_FEM *OCC_PROF

The following line introduces a text (or a corpus subset) codified with the same variables and the IDnumber label

**** *IDnumber_0001 *AGE_ADUL *SEX_FEM *OCC_PROF

The following line introduces a text (or a corpus subset) codified with two variables: YEAR, NEWSP.

**** *YEAR_98 *NEWSP_TIMES

In each coding line these T-LAB rules are observed:

1. Each label (IDnumber, variables and variable categories) cannot be spaced out by blank spaces;
2. Each label - both for variables and variable categories - cannot be longer than 25 characters (min. 2);
3. Each variable label must be linked to the respective category using an underscore ("_");
4. Between two different variables, that is before the next asterisk, a blank space must be inserted;
5. Each variable and respective category must be assigned for each corpus subset;
6. We can use a maximum of 50 variables, each allowing a max of 150 categories which can be compared;
7. The maximum IDnumbers is fixed at 99.999 for short texts (Max. 2,000 characters each, e.g. responses to open-ended questions, twitter messages, etc.) at 30,000 for the other cases.