Chomsky Normal Form (CNF)

 

📘 Chomsky Normal Form (CNF)

✅ Definition

A context-free grammar (CFG) is said to be in Chomsky Normal Form if every production rule is of the form:

  1. Two nonterminals:

    A    BCA \;\to\; BC

    where A,B,CA, B, Care variables (nonterminals) and neither BB nor CC is the start symbol.

  2. Single terminal:

    A    aA \;\to\; a

    where aa is a terminal symbol.

  3. Optional ε-production (only if the language includes ε):

    S    εS \;\to\; \varepsilon

    allowed only if SS is the start symbol and SS does not appear on the right-hand side of any rule.


❌ What is NOT allowed

  • Rules like Aε (except possibly for start symbol).

  • Rules like AB (unit productions).

  • Rules like AaB (mix of terminal + nonterminal).

  • Rules with more than 2 nonterminals on the RHS (e.g., ABCD).

  • Rules with more than 1 terminal on the RHS (e.g., Aab).


🌟 Example 1: Grammar in CNF

A→ABBC
ABAa
BCCb
CABa

✔ All productions are either of type ABC or Aa
So this grammar is in CNF.


🚫 Example 2: Grammar NOT in CNF

    SAAS
    Aaa
  • SAAS ❌ not allowed (3 symbols on RHS).

  • Aaa ❌ not allowed (2 terminals together).


🛠 Why CNF is Special

  1. Simplifies parsing algorithms (like CYK algorithm).

  2. Provides a canonical form → easier to prove properties of CFGs.

  3. Ensures maximum standardization: every CFG can be converted into an equivalent CNF (except for ε-productions).

📘 Theorem (CNF Conversion)

Theorem:
Any context-free grammar G=(V,T,S,P) with λL(G) has an equivalent grammar in Chomsky Normal Form (CNF).


✨ Proof 

We want to systematically rewrite G into an equivalent grammar Gin CNF.

🔹 Step 0: Simplification

any CFG can be converted to one without:

  • λ-productions (except possibly Sλ)

  • unit-productions

  • useless symbols

So we assume G is already simplified.


🔹 Step 1: Remove terminals from long productions

Every production has the form:

Ax1x2xn,n1

where each xiVT

  • If n=1n=1, then x1x_1 must be a terminal aa. Keep Aa

  • If n2:

    • For each terminal aT, introduce a new variable Ba

    • Replace aa in the rule with Ba

    • Add the rule Baa

✅ At the end, every production looks like:

  • AC1C2CnA \to C_1 C_2 \cdots C_n, where each CiV1(no terminals mixed in).

  • Or Aa


🔹 Step 2: Break down long right-hand sides

Now, if we still have rules like AC1C2C3Cwith n3:

Introduce new variables D1,D2,D_1, D_2, \dotsto "chain" the production into binary form:

AC1D1,D1C2D2,,Dn2Cn1Cn

✅ At the end, every production is either:

  • ABC (two variables), or

  • Aa (a single terminal).

Thus, the grammar is in CNF.


Example:

S → A B a
A → a a b
B → A c

1) Identify terminals and nonterminals

Nonterminals: S, A, B
Terminals: a, b, c

No ε-productions or unit-productions are present, so we can proceed.

2) Replace terminals that occur in productions of length ≥ 2

Introduce new nonterminals that produce single terminals:

X_a → a X_b → b X_c → c

Replace occurrences of terminals in RHS of length > 1:

  • S → A B aS → A B X_a

  • A → a a bA → X_a X_a X_b

  • B → A cB → A X_c

Now every RHS contains only nonterminals, but some RHSs are length 3.

3) Binarize RHSs of length > 2

Replace length-3 RHSs with chains of binary productions by introducing new helper nonterminals.

For A → X_a X_a X_b introduce D1:

A → X_a D1 D1 → X_a X_b

For S → A B X_a introduce D2:

S → A D2 D2 → B X_a

B → A X_c is already binary (A and X_c), so keep it.

4) Final grammar in CNF

Nonterminals: S, A, B, D1, D2, X_a, X_b, X_c
Terminals: a, b, c

Productions (all in CNF form A → BC or A → a):

S → A D2 D2 → B X_a A → X_a D1 D1 → X_a X_b BA X_c X_a → a X_b → b X_c → c

Each production is either two nonterminals on the right (A → BC) or a single terminal (X_a → a, etc.), so the grammar is in CNF. This grammar generates the same language as the original one (no λ in the original language), so the conversion is correct.

Comments

Popular posts from this blog

Theory Of Computation PCCST302 KTU Semester 3 BTech 2024 Scheme

Non deterministic Finite Automata NFA

Formal Definition - Turing Machine