N.B.: For the segmentation of the Chinese texts the 'Pan Gu Segment' library is used (http://pangusegment.codeplex.com/).
A.2-Dictionary-based lemmatization for nine (9) further languages;
A.3-Stemming algorithms for fifteen (15) languages;
N.B.: The main difference between (a) lemmatization and (b) stemming lies in how the inflectional forms of each word are normalized. In fact: (a) in the case of the lemmatization (see https://en.wikipedia.org/wiki/Lemmatisation) the normalization consists in grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form (e.g.: 'arguing' -> 'argue'); (b) in the case of stemming (https://en.wikipedia.org/wiki/Stemming), which usually simply removes inflectional endings, the stem need not be identical to the morphological root of the word (e.g.: 'arguing' -> 'argu').
Here is the list of the new languages for which the automatic lemmatization or the stemming process is supported by T-LAB Plus 2017.
LEMMATIZATION: Catalan, Croatian, Polish, Romanian, Russian, Serbian, Slovak, Swedish, Ukrainian.
STEMMING: Arabic, Bengali, Bulgarian, Czech, Danish, Dutch, Finnish, Greek, Hindi, Hungarian, Indonesian, Marathi, Norwegian, Persian, Turkish.
When selecting languages in the setup form, while the six languages(*) for which T-LAB already supported the automatic lemmatization can be selected trough the button on the left (see 'A' below), the new one can be selected trough the button on the right (see 'B' below).
(*) English, French, German, Italian, Portuguese and Spanish.