T-LAB Home
T-LAB 10.2 - ON-LINE HELP Prev Page Prev Page
T-LAB
Introduction
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
File
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Settings
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Concordances
Co-occurrence Toolkit
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Texts and Discourses as Dynamic Systems
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Classification of New Documents
Key Contexts of Thematic Words
Export Custom Tables
Editor
Import-Export Identifiers list
Glossary
Analysis Unit
Association Indexes
Chi-Square
Cluster Analysis
Coding
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Disambiguation
Dictionary
Elementary Context
Frequency Threshold
Graph Maker
Homograph
IDnumber
Isotopy
Key-Word (Key-Term)
Lemmatization
Lexical Unit
Lexie and Lexicalization
Markov Chain
MDS
Multiwords
N-grams
Naïve Bayes
Normalization
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Profile
Specificity
Stop Word List
Test Value
Thematic Nucleus
TF-IDF
Variables and Categories
Words and Lemmas
Bibliography
www.tlab.it

n-gram


In T-LAB an n-gram is a sequence of two (bi-gram) or more contiguous key words present within the same elementary context (i.e. sentence, text fragment or paragraph).

When used for computing word co-occurrences, n-gram segmentation overlooks both stop-words and punctuation marks.

Let's consider the following example:

The Citizens of each State shall be entitled to all Privileges and Immunities of Citizens in the several States.

By assuming that the seven items in red are included in our key-term list and that an automatic lemmatization has been applied, a bi-gram segmentation produces the following co-occurrence contexts:

citizen & state
state & entitle
entitle & privilege
privilege & immunity
immunity & citizen
citizen & state.

Differently, a three-gram segmentation produces the following co-occurrence contexts:

citizen & state & entitle
state & entitle & privilege
entitle & privilege & immunity
privilege & immunity & citizen
immunity & citizen & state
citizen & state.


It is worth recalling that, when segmenting texts into elementary contexts, co-occurrences depend on the presence (or absence) of key words; whereas, when using an n-gram segmentation, co-occurrences indicate a sequential relationship between words.
In T-LAB an n-gram based co-occurrence analysis can be performed with the advanced options of the Word Association tool; moreover, a Markovian analysis of bi-grams can be performed with the Sequence Analysis tool.