Removing useless symbols and productions
1. What are “useless symbols”?
A useless symbol in a CFG is a non-terminal or terminal that does not contribute to deriving any string of terminals from the start symbol.
There are two types of useless symbols:
-
Non-generating symbols
-
Symbols that cannot derive any terminal string.
-
Example: If a non-terminal never leads to terminals, it is useless.
-
-
Non-reachable symbols
-
Symbols that cannot be reached from the start symbol.
-
Even if they could generate terminals, if the start symbol never reaches them, they are useless.
-
We remove both types to simplify the grammar.
2. Step 1: Remove Non-Generating Symbols
Algorithm :
-
Mark all terminal symbols as generating.
-
Iteratively mark non-terminals A as generating if there exists a production:
where every symbol in α is already generating.
-
Repeat until no new generating symbols are found.
-
Remove all non-generating symbols and any production containing them.
Example
CFG:
Step 1: Identify generating symbols
-
Terminals:
a, b, c
→ generating -
Check non-terminals:
-
B → b
→ B is generating -
A → aA
(a is terminal, A? not yet generating) → keep checking -
A → B
→ B generating → A is generating -
S → AB | a
→ both A & B generating → S generating -
C → cC
→ c is terminal, C? C depends on itself → never fully generating → C is non-generating
-
Remove C and its productions:
3. Step 2: Remove Non-Reachable Symbols
Algorithm :
-
Start with the start symbol S as reachable.
-
Iteratively mark symbols reachable if they appear on the right-hand side of a production of a reachable non-terminal.
-
Remove all non-reachable symbols and their productions.
Example (continuing)
CFG after Step 1:
-
Start with
S
→ reachable -
S → AB
→ A, B reachable -
A → aA | B
→ A and B already reachable -
B → b
→ B already reachable
✅ All symbols are reachable → nothing removed in this step.
Concept of a Dependency Graph
-
Vertices: Each vertex represents a variable (non-terminal) in the CFG.
-
Edges: There is an edge from C → D if C has a production containing D, i.e., , where are strings of terminals and/or variables.
-
Interpretation:
-
If there is a path from S (start symbol) to some variable V, then V is reachable.
-
If no path exists, V is unreachable → useless → remove it.
-
This is essentially a graph reachability problem.
Consider the Grammer
S->.aS|AA->a
B->aa
A->a
4. Final Simplified CFG
-
Useless symbol
C
removed. -
Productions only involve symbols contributing to terminal strings derivable from
S
.
5. Summary of Steps
-
Remove non-generating symbols
-
Symbols that can never produce a terminal string.
-
-
Remove non-reachable symbols
-
Symbols that the start symbol cannot reach.
-
-
Result: A simplified CFG generating exactly the same language.
6. Application to NPDA → CFG
After converting an NPDA to a CFG:
-
Many non-terminals like
[q_i X q_j]
may never appear in derivations from S → non-reachable → remove. -
Some non-terminals may never derive terminals, e.g., intermediate states that cannot pop the stack fully → non-generating → remove.
✅ This makes the NPDA-derived CFG much smaller and manageable, while still generating the same language.
Comments
Post a Comment