| 
  
www.tlab.it
 Formal Criteria
 
 
In the case of a corpus made up
of a single text, and when the user
doesn't resort to variables, there are no further operations required: it is
possible to continue with the importation phase.
 
 
 
When, on the other hand, the corpus is made up of various
text documents and/or categorical variables are used, the corpus
preparation must be done by means of the Corpus Builder tool
(see above) which, automatically, respects the following
criteria:
 
Each text or subset of it (the "parts" defined by
variables and/or IDnumber) is preceded
by a coding line.
 
Each coding line has this
format: 
- It begins with a
four asterisks string (****) followed
by a blank space. T-LAB reads this string as: "here
begins a user-defined text or a context unit".
 
- It goes on with the
addition of strings made up by single
asterisks and labels that define cases (IDnumber), variables
and respective categories.
 
- It ends with the return
key.
 
Here are some examples.
 
The following line introduces a text (or a corpus subset)
codified with three variables - AGE, SEX and OCC (occupation) - and
their categories (ADUL, FEM, PROF).
 
**** *AGE_ADUL *SEX_FEM *OCC_PROF
 
  
The following line introduces a text (or a corpus subset)
codified with the same variables and the IDnumber label
 
**** *IDnumber_0001 *AGE_ADUL *SEX_FEM
*OCC_PROF
  
 
The following line introduces a text (or a corpus subset)
codified with two variables: YEAR, NEWSP.
 
**** *YEAR_98 *NEWSP_TIMES
 
In each coding line these T-LAB rules
are observed:
 
1. Each label (IDnumber, variables and variable
categories) cannot be spaced out by blank spaces; 
2. Each label - both for variables and variable categories - cannot
be longer than 25 characters (min. 2); 
3. Each variable label must be linked to the respective category
using an underscore ("_"); 
4. Between two different variables, that is before the next
asterisk, a blank space must be inserted; 
5. Each variable and respective category must be assigned for each
corpus subset; 
6. We can use a maximum of 50 variables, each allowing a max of 150
categories which can be compared; 
7. The maximum IDnumbers is fixed at 99.999 for short texts (Max.
2,000 characters each, e.g. responses to open-ended questions,
twitter messages, etc.) at 30,000 for the other
cases.
 
  
 
  
   |