www.tlab.it

Multi-Word list


This option allows the user to create/modify Multi-Word lists within the following form.

Each Multiwords.txt file can be made up by "N" lines (max 5,000), each with a multiple word of max 50 characters, without punctuation marks.

Here are some lines of Multiwords.txt in the correct format:

Seattle people
Chamber of commerce
National Health Service
America's greatest traditions

etc etc

By clicking on the "Use this list…" button, the user can produce an automatic and quick transformation of the multi-words present in a corpus in single strings that can be recognized and classified by T-LAB (e.g. "secretary of state" turns into "secretary_of_state")

After running, this option generates a new file (New_Corpus.txt) which, properly renamed, can be analysed with T-LAB.

To verify/use Multiword lists during the importation of a new corpus the user has to select the "Advanced" option in the following form: