Thomson's Construction

July 02, 2025

Thomson’s construction (also called Thompson’s construction) is a method for converting a regular expression into a non-deterministic finite automaton (NFA). It is a fundamental technique used in compiler design and automata theory, especially for implementing regular expression matching.

🧠 Why Thomson’s Construction?

It systematically builds an NFA from any regular expression.
The resulting NFA accepts the same language as the regular expression.
This NFA may include ε-transitions (transitions that consume no input).
It is simple and suitable for implementation in regex engines and lexical analyzers.

📘 Basic Idea

Thomson's construction defines how to build small NFAs for basic symbols and how to combine them using:

Concatenation
Union (| or +)
Kleene Star (*)

Each construction has a fixed pattern and introduces new start and accept (final) states.

🔨 Building Blocks

Let’s define the NFA for the following regular expression components:

1. Single Symbol – `a`

2. Epsilon (ε)

🔧 Operators

3. Concatenation (R1 · R2)

If we have two NFAs:

NFA for R1: from q1 to q2
NFA for R2: from q3 to q4

Then for R1R2:

4. Union (R1 + R2 or R1 | R2)

Create a new start and final state, and connect them using ε-transitions:

ε from new start to R1 and R2's start
ε from R1 and R2’s final states to new final

5. Kleene Star (R)*

Create a new start and final state:

Add ε-transition from:
- new start to R's start
- R's final to new final
- R's final back to R's start
- new start directly to final (for ε)

✅ Advantages of Thompson's Construction

Straightforward and mechanical
Resulting NFA is guaranteed to be correct and accept the regular language
The construction always results in an NFA with a number of states linear in the size of the regular expression

⚠️ Disadvantages

The resulting NFA has ε-transitions, which are not efficient for direct simulation
Need ε-closure computation for NFA simulation or conversion to DFA
The NFA may be non-deterministic, so not ideal for some applications unless converted to DFA

🧾 Use in Practice

Used in lexical analyzers (e.g., Lex)
Forms the basis for converting regex into state machines in compilers
Forms the first step in converting regular expressions to DFA (via ε-NFA → NFA → DFA)

Search This Blog

Theory Of Computation PCCST302 KTU Semester 3 BTech 2024 Scheme