T-LAB 10

T-LAB Plus 2022

16 October 2021

T-LAB 10.6

was released on January 10 2025

When analysing an entire corpus or a sub-set of it (e.g. a topic or a thematic cluster) any co-occurrence network (see Picture 1 below) can now be displayed by using the Maximum Spanning Tree (MaxST), i.e. a graph with a simplified structure which is easier to explore and to interpret (see Picture 2 below).

By focusing on the most relevant features (i.e., words) and by discarding weaker connections, the MaxST offers two main advantages:

1- it helps to identify the most significant connections (i.e., edges with the largest weights) between words, allowing the network to focus on the strongest co-occurrence relationships;

2- it can reveal meaningful semantic structures or clusters in the text, which might be obscured by weaker connections in a dense network. For these reasons, MaxST can also be used for similarity analysis tasks in the study of social representations (see ‘arbre maximum’ and ‘analyse de similitude’ in here).

All MaxST graphs produced by T-LAB – which uses the Kruskal's algorithm - allow the user to show/hide the weights of the various connections, which are association index values. By default, in these graphs different colours are used for each community (i.e., word cluster) found by the Louvain algorithm.

To obtain any MaxST graph with nodes (i.e., words) ranging from 20 to 500, just click on the corresponding icon, which is present in the following three T-LAB tools:

1- Co-Word Analysis tool

2 - Co-occurrence Toolkit tool

3- Graph Maker tool

When clicking the MaxST icon, the user can also choose to display a MinST (i.e., Minimum Spanning Tree), which - by prioritizing edges with the smallest weights - is particularly valuable in scenarios where highlighting weak or subtle connections between words is important.

N.B.: As the Graph-Maker option is available in many T-LAB tools, the user has plenty of opportunities to deepen his analysis. For example, when using the Word Associations tool, she/he can explore how all items (or a sub-set of them) associated with a target word interact with each other (see pictures below).

Among the various improvements introduced in T-LAB 10.6, the following are also noteworthy:

a) depending on the chart type, there are additional customization options available (See pictures below).

- Word-Associations with values

- Ego-network with arrows (Sequence and Network Analysis tool)

- Network graphs with convex hulls for clusters

b) all .docx and .xlsx files created with the latest versions of Word and Excel can be easily imported and processed.

c) the Text Screening tool allows the user to open and edit files in five different formats before importing them (see picture below).

Starting from T-LAB 10.3, Interactive Heatmaps in HTML format can now be created when any table is displayed, where rows represent words and columns represent variable values. This option, available by right-clicking, is particularly useful when the values in the table are obtained by using any comparative or thematic analysis tool. For example, the following heatmaps were obtained by performing a thematic analysis of two distinct datasets: the first concerning reports on climate change and the second concerning the “Artificial Intelligence Act” of the European Commission.

Now, when displaying any scatter plot, a further option is available through right-click that allows the user to create and visualize Interactive Scatter Charts in HTML format (see pictures below) which are highly customizable.

When performing a Co-Word Analysis with a Hierarchical Clustering option, an Interactive Chart in HTML format (see pictures below) allows the user to easily explore which words belong to each cluster (i.e., thematic nucleus) by hovering over the corresponding label.

A new option has been added to the Co-occurrence Toolkit, which – when selecting any association index – allows the user to plot both radial diagrams concerning target words and two-dimensional scatterplots obtained by t-SNE on the selected index values, i.e. by performing a type of Co-Word Analysis (see pictures below).

Finally, the interactive visualization of Word Trees (i.e., Interactive Visual Concordance) is now also available when using both the Word Associations and the Sequence Analysis tools.
N.B.: As - in these cases - the keywords are lemmas (i.e., groups of words related to the same root), both the lemmas and the corresponding words are displayed (see picture below).

T-LAB 10.2 was releases on September 24 2023.

A new tool, named Co-occurrence Toolkit, added.

This tool, which can be used for a variety of tasks, offers a set of techniques for building and analysing word co-occurrence matrices with up to 5,000 columns.

The matrices to be built can be both symmetric and asymmetric, and they can represent the co-occurrences of the words either within the whole corpus or within a subset of it.

N.B.: In the case of word co-occurrences, the difference between symmetric and asymmetric matrices is that symmetric matrices assume that the order of words does not matter (i.e., they are represented as undirected graphs where the values in a row and a column are the same), while asymmetric matrices take into account the direction of co-occurrence and, for this reason, are represented as directed graph where the values in a row (i.e., successor) and a column (i.e., predecessor) are not necessarily the same.

Whichever tool you are using, the way to export tables and graphs is very simple (see picture below).

Below is a short description of some of the new analysis options (click here for the complete documentation).

The TOPIC ANALYSIS of the word co-occurrence matrix uses the same algorithm of the T-LAB Modeling of Emerging Themes, tool (i.e., Latent Dirichlet Allocation and the Gibbs Sampling); however, in this case, both the indexes of the matrix (i.e., the ‘i’ and the ‘j’) refer to the same words and the values correspond to their co-occurrences. As can be verified, the results of this approach are quite interesting and consistent.

The RELEVANT WORDS - SVD provides a relevance score for each word, which is computed by summing the square of its first 3 dimensions (i.e., the eigenvectors), each one multiplied by its corresponding singular value, and then by computing the square root of that sum.

This means that the words with the higher scores are the farthest from the point of origin, which is the point where the horizontal axis (x-axis) and the vertical axis (y-axis) intersect. And, for this reason, they are the words that most contribute to organizing semantic polarizations, which can also have emotional connotations.

The SEMANTIC DIVERSITY of each word (i.e., its ability to have links with many other words) is measured by means of the entropy index.

N.B.: The average entropy of the word co-occurrence matrix can be used to quantify the ‘complexity’ of a text, since more complex texts (i.e., texts in which many words cooccur with a variety of other words) tend to have higher entropy than simpler texts (i.e., texts in which many words cooccur with only a few other words and – for that reason – are more predictable).

The ‘local’ CLUSTERING COEFFICIENT is a measure of the degree to which nodes in a graph tend to cluster together and to pair up with each other (i.e., something like ‘The friend of my friend is my friend.’). In other words, the clustering coefficient of a node (i.e., word) quantifies how close its neighbours (i.e., other words) are to being a tightly connected subgroup (i.e., a clique). It is computed as the proportion of the ‘actual’ connections among its neighbours compared with the number of all its ‘possible’ connections. Its maximum value is ‘1’, and the average clustering coefficient of all nodes it is also known as ‘transitivity’ of the network.

N.B. When a network has a large clustering coefficient and a small average path length it can be considered a ‘small world’ (see Wikipedia)

T-LAB 10.1 was releases on December 16 2022.

Here is a short list of the most significant improvements made in this version of the software.

1 – When using the Co-Word Analysis tool, a new interactive dendrogram is available which allows the user to explore the relationships between up 3,000 (three thousand) key-words (see illustrations below).

2 – Now every network graph (force directed graph) available in the T-LAB tools allows the user to highlight the links of each target word on mouse over (see pictures below).

3 – Additionally, both when using the Co-Word Analysis tool and when using the Sequence Analysis tool, a new interactive graph allows the user to dynamically explore the relationship between key-words and the groups to which they belong (see below).

4 – In the new version the available options of the Graph Maker tool, which allows the user to explore various relationships between selected keywords, vary according to the analysis that has been performed previously (see below).

This means that - for example - after performing a thematic analysis or a topic analysis, the user can now explore each cluster/topic by using a 3d PCA (Principal Component Analysis).

5 – Additionally, each time the user edits/modifies a keyword list, he is now enabled to compute and export the TF-IDF values of all items included in his list (see pictures below). In this case, the procedure consists of two steps: (a) computation of the TF-IDF values of a N x M matrix, where ‘N’ are context units (either documents or elementary contexts) and ‘M’ are words (i.e. lemmas); (b) computation of the averaged TF-IDF value of every word within the ‘N’ context units.

6 – Finally, the interactive Word Trees option available through the Concordances tool allows the user to visualize the outputs in three different ways. In fact each root word can be placed either on the left, on the right, or in the centre of its occurrence contexts (see pictures below).

T-LAB 10

T-LAB Plus 2022

T-LAB 10

T-LAB 10.6

was released on January 10 2025

Related posts

T-LAB Plus 2022