T-LAB Plus 2022

T-LAB Plus 2021
14 October 2020
T-LAB 10
15 May 2024

T-LAB Plus 2022

T-LAB Plus 2022

was released on October 16 2021

Below is an illustration of the main new features and improvements found in this version of the software.

1 - The way T-LAB processes Chinese texts has been refined and three built-it examples in this language have been added, that is the Analects of Confucius, the 2020 annual report on China's policies and actions to address Climate Change, ten thousand Weibo posts related to COVID-19 (see the pictures below).


2 - Now the ‘Open Table’ option of the Corpus Builder tool allows one to easily import data in three further formats: .SAV (i.e. Spss files), .JSON (e.g. Twitter data) and .XML. Moreover, the process through which T-LAB generates a corpus from a data table with thousands of records is faster.


3 - The way T-LAB imports/exports .XLS and .XLSX files has been improved and it doesn't require having Microsoft Office installed anymore. Also, when importing .CSV files the Corpus Builder tool automatically detects delimiters and from the main menu the user is now allowed to choose the default format of the .CSV files to be exported (see the picture below).


4 - By taking into account that many T-LAB tools use clustering algorithms, as a guide for the users the sub-menu of Cluster Analysis has been changed as shown below. Accordingly, when selecting one of the possible options, one is automatically redirected to the corresponding tool.


For example, when choosing the first of the above options, the following window will appear.


As a reminder, here is a dendrogram which summarizes the main T-LAB tools to which the Cluster Analysis sub-menu may redirect (see below the tools marked with a red bullet point).


5 - Now, before performing an SVD (i.e. Single Value Decomposition) of a co-occurrence matrix with up to 5,000 columns, it is possible to access several advanced options for word embedding .


As a result of this, after the user checks the advanced options (e.g. co-occurrence context and co-occurrence threshold), T-LAB performs the following steps: 1- Build the co-occurrence matrix; 2- Compute PPMI values (Positive Pointwise Mutual Information); 3- Perform an SVD; 4- Extract the first 50 dimensions (i.e. word embedding).


Also, by clicking the Associations button it is possible to explore the second-order similarities of each item.

N.B.: While first order indexes point out phenomena concerning the syntagmatic axis ('in praesentia' combination and proximity, i.e. each word 'near to' the other), second order indexes point out phenomena concerning the paradigmatic axis ('in absentia' association and similarity, i.e. quasi-synonymity between key-terms used within the same corpus).


Moreover, depending on the size of the corpus and on the clustering method, it is possible to obtain and explore up to 30 clusters (K-Means method) and up to 20 cluster partitions (Hierarchical method).


6 - Further tables can be exported which allow the user to process T-LAB outputs with other software for data analysis. Among these, are the adjacency matrix created by the Sequences and Network Analysis tool and the co-occurrence matrix created by the Co-Word Analysis tool, both with up to 5,000 columns (see the pictures below).


7 -Latest update (5 May 2022): automatic lemmatisation for Latin language added.

On May 2002 a unique version of T-LAB (named ‘Pro Lingua Latina’) was released, which allowed scholars to analyse texts in the Latin language by using content analysis and text mining techniques.

Now, some twenty years later, at a time when European people are examining their historical and cultural roots, we thought it would be interesting to take another look at the language which has been spoken in western countries for more than two thousand years and which has shaped their culture, their values, their institutions and also the way they keep thinking.

The new update, which performs a dictionary-based lemmatisation of Latin texts and allows the exploration of both their semantic and thematic dimensions, includes four additional demo files concerning various topics: agriculture (Cato the Elder – De Agri Cultura), law (Justinian the Great - Institutiones), politics (Cicero – De Re Publica) and philosophy (Augustine of Hippo – Confessiones).