Chomsky Normal Form (CNF)
📘 Chomsky Normal Form (CNF)
✅ Definition
A context-free grammar (CFG) is said to be in Chomsky Normal Form if every production rule is of the form:
-
Two nonterminals:
where are variables (nonterminals) and neither nor is the start symbol.
-
Single terminal:
where is a terminal symbol.
-
Optional ε-production (only if the language includes ε):
allowed only if is the start symbol and does not appear on the right-hand side of any rule.
❌ What is NOT allowed
-
Rules like (except possibly for start symbol).
-
Rules like (unit productions).
-
Rules like (mix of terminal + nonterminal).
-
Rules with more than 2 nonterminals on the RHS (e.g., ).
-
Rules with more than 1 terminal on the RHS (e.g., ).
🌟 Example 1: Grammar in CNF
✔ All productions are either of type or
So this grammar is in CNF.
🚫 Example 2: Grammar NOT in CNF
❌ not allowed (3 symbols on RHS).
-
❌ not allowed (2 terminals together).
🛠 Why CNF is Special
-
Simplifies parsing algorithms (like CYK algorithm).
-
Provides a canonical form → easier to prove properties of CFGs.
-
Ensures maximum standardization: every CFG can be converted into an equivalent CNF (except for ε-productions).
📘 Theorem (CNF Conversion)
Theorem:
Any context-free grammar with has an equivalent grammar in Chomsky Normal Form (CNF).
✨ Proof
We want to systematically rewrite into an equivalent grammar in CNF.
🔹 Step 0: Simplification
any CFG can be converted to one without:
-
λ-productions (except possibly )
-
unit-productions
-
useless symbols
So we assume is already simplified.
🔹 Step 1: Remove terminals from long productions
Every production has the form:
where each
-
If , then must be a terminal . Keep
-
If :
-
For each terminal , introduce a new variable
-
Replace in the rule with
-
Add the rule
-
✅ At the end, every production looks like:
-
, where each (no terminals mixed in).
-
Or
🔹 Step 2: Break down long right-hand sides
Now, if we still have rules like with :
Introduce new variables to "chain" the production into binary form:
✅ At the end, every production is either:
-
(two variables), or
-
(a single terminal).
Thus, the grammar is in CNF.
Example:
1) Identify terminals and nonterminals
Nonterminals: S, A, B
Terminals: a, b, c
No ε-productions or unit-productions are present, so we can proceed.
2) Replace terminals that occur in productions of length ≥ 2
Introduce new nonterminals that produce single terminals:
Replace occurrences of terminals in RHS of length > 1:
-
S → A B a
⇒S → A B X_a
-
A → a a b
⇒A → X_a X_a X_b
-
B → A c
⇒B → A X_c
Now every RHS contains only nonterminals, but some RHSs are length 3.
3) Binarize RHSs of length > 2
Replace length-3 RHSs with chains of binary productions by introducing new helper nonterminals.
For A → X_a X_a X_b
introduce D1
:
For S → A B X_a
introduce D2
:
B → A X_c
is already binary (A
and X_c
), so keep it.
4) Final grammar in CNF
Nonterminals: S, A, B, D1, D2, X_a, X_b, X_c
Terminals: a, b, c
Productions (all in CNF form A → BC
or A → a
):
Each production is either two nonterminals on the right (A → BC
) or a single terminal (X_a → a
, etc.), so the grammar is in CNF. This grammar generates the same language as the original one (no λ in the original language), so the conversion is correct.
Comments
Post a Comment