Thomson's Construction

Thomson’s construction (also called Thompson’s construction) is a method for converting a regular expression into a non-deterministic finite automaton (NFA). It is a fundamental technique used in compiler design and automata theory, especially for implementing regular expression matching.


🧠 Why Thomson’s Construction?

  • It systematically builds an NFA from any regular expression.

  • The resulting NFA accepts the same language as the regular expression.

  • This NFA may include ε-transitions (transitions that consume no input).

  • It is simple and suitable for implementation in regex engines and lexical analyzers.


📘 Basic Idea

Thomson's construction defines how to build small NFAs for basic symbols and how to combine them using:

  • Concatenation

  • Union (| or +)

  • Kleene Star (*)

Each construction has a fixed pattern and introduces new start and accept (final) states.


🔨 Building Blocks

Let’s define the NFA for the following regular expression components:

1. Single Symbola



2. Epsilon (ε)


🔧 Operators

3. Concatenation (R1 · R2)

If we have two NFAs:

  • NFA for R1: from q1 to q2

  • NFA for R2: from q3 to q4

Then for R1R2:




4. Union (R1 + R2 or R1 | R2)

Create a new start and final state, and connect them using ε-transitions:

   

  • ε from new start to R1 and R2's start

  • ε from R1 and R2’s final states to new final


5. Kleene Star (R)*

Create a new start and final state:

  • Add ε-transition from:

    • new start to R's start

    • R's final to new final

    • R's final back to R's start

    • new start directly to final (for ε)

   


Advantages of Thompson's Construction

  • Straightforward and mechanical

  • Resulting NFA is guaranteed to be correct and accept the regular language

  • The construction always results in an NFA with a number of states linear in the size of the regular expression


⚠️ Disadvantages

  • The resulting NFA has ε-transitions, which are not efficient for direct simulation

  • Need ε-closure computation for NFA simulation or conversion to DFA

  • The NFA may be non-deterministic, so not ideal for some applications unless converted to DFA


🧾 Use in Practice

  • Used in lexical analyzers (e.g., Lex)

  • Forms the basis for converting regex into state machines in compilers

  • Forms the first step in converting regular expressions to DFA (via ε-NFA → NFA → DFA)


🔁 Example: Regular Expression a(a + b)*ab

🔁 Example: Regular Expression a+b+ab



🔁 Example: Regular Expression identifiers: letter (letter+digit)*


🔁 Example: Regular Expression identifiers: (0+1)*1(0+1)

🔁 Example: Regular Expression identifiers: (a+b)*a




Comments

Popular posts from this blog

Theory Of Computation PCCST302 KTU Semester 3 BTech 2024 Scheme

Non deterministic Finite Automata NFA

Example DFAs University Questions