Thomson's Construction
Thomson’s construction (also called Thompson’s construction) is a method for converting a regular expression into a non-deterministic finite automaton (NFA). It is a fundamental technique used in compiler design and automata theory, especially for implementing regular expression matching.
🧠 Why Thomson’s Construction?
It systematically builds an NFA from any regular expression.
The resulting NFA accepts the same language as the regular expression.
This NFA may include ε-transitions (transitions that consume no input).
It is simple and suitable for implementation in regex engines and lexical analyzers.
📘 Basic Idea
Thomson's construction defines how to build small NFAs for basic symbols and how to combine them using:
Concatenation
Union (| or +)
Kleene Star (*)
Each construction has a fixed pattern and introduces new start and accept (final) states.
🔨 Building Blocks
Let’s define the NFA for the following regular expression components:
1. Single Symbol – a
2. Epsilon (ε)
🔧 Operators
3. Concatenation (R1 · R2)
If we have two NFAs:
NFA for R1: from q1 to q2
NFA for R2: from q3 to q4
Then for R1R2:
4. Union (R1 + R2 or R1 | R2)
Create a new start and final state, and connect them using ε-transitions:
ε from new start to R1 and R2's start
ε from R1 and R2’s final states to new final
5. Kleene Star (R)*
Create a new start and final state:
Add ε-transition from:
new start to R's start
R's final to new final
R's final back to R's start
new start directly to final (for ε)
✅ Advantages of Thompson's Construction
Straightforward and mechanical
Resulting NFA is guaranteed to be correct and accept the regular language
The construction always results in an NFA with a number of states linear in the size of the regular expression
⚠️ Disadvantages
The resulting NFA has ε-transitions, which are not efficient for direct simulation
Need ε-closure computation for NFA simulation or conversion to DFA
The NFA may be non-deterministic, so not ideal for some applications unless converted to DFA
🧾 Use in Practice
Used in lexical analyzers (e.g., Lex)
Forms the basis for converting regex into state machines in compilers
Forms the first step in converting regular expressions to DFA (via ε-NFA → NFA → DFA)
Comments
Post a Comment