Pumping Lemma For Context Free Languages

Pumping Lemma For Context Free Languages - CFL

June 14, 2025

Pumping Lemma for Context-Free Languages

Let $L$ be an infinite context-free language.
Then there exists some positive integer $m$ such that any

$\omega \in L \text{ with } |\omega| \ge m$

can be decomposed as

$\omega = uvxyz$

such that

$|v x y| \le m$ , Pumping portion is small
$|v y| \ge 1$ , Length of middle section is bounded
$uv^i x y^i z \in L$ for all $i = 0, 1, 2, \dotsPumping preserves membership$

Intuitive Explanation

The lemma tells us that long enough strings in any context-free language must contain a portion that can be "pumped" (repeated or deleted) and still produce valid strings in the same language.
This happens because context-free grammars have finite sets of variables but can generate infinite strings.
So, when we derive a very long string using a parse tree, some variable must repeat along some path in the tree.
This repeated variable allows us to repeat or remove the corresponding part of the string — giving the “pumping” property.

Proof

Consider the language $L - \{\lambda\}$ , and let $G = (V, T, S, P)$ be a context-free grammar for $L$ without λ-productions or unit-productions.

Let the maximum length of the right-hand side of any production in $G$ be $k$ .
That is, for each production $A \to α \in P, we have$ $∣ α ∣ \leq k.$

Since $L$ is infinite, it has derivations of arbitrarily long length and hence derivation trees of arbitrarily large height.

Step 1: Repetition of a variable along a path

Let the number of variables in $G$ be $n$ .
In any derivation tree of height greater than $n$ ,
there must be some variable that repeats on a single path from the root to a leaf (by the pigeonhole principle).

Suppose a variable $A$ occurs twice along a path,once near the top and once lower down.

Schematically, the derivation has the form:

S \Rightarrow^* u A z \Rightarrow^* u v A y z \Rightarrow^* u v x y z

where $u, v, x, y, z$ are strings of terminals.

Step 2: Structure of the repeated variable

From this derivation we see:

A \Rightarrow^* v A y

and

A \Rightarrow^* x

Thus, the variable $A$ can derive a string containing itself (surrounded by $v$ and $y$ ),
and it can also derive a string $x$ consisting only of terminals.

This gives the pattern that allows “pumping” of the substrings $v$ and $y$ .

Step 3: Pumping mechanism

Replacing the lower occurrence of $A$ with its derivation repeatedly yields:

S \Rightarrow^* u A z \Rightarrow^* u v A y z \Rightarrow^* u v v A y y z \Rightarrow^* \cdots

Finally, replacing the last $A$ by $x$ gives the sequence of strings:

u v^i x y^i z, \quad i = 0, 1, 2, \ldots

Each of these strings is generated by $G$ , and therefore belongs to $L$ .

Step 4: Bounds on the substring lengths

Since the grammar $G$ has a finite number of variables and each production is of finite length, the size of the section between the two occurrences of $A$ in the derivation tree can be bounded by some integer $m$ .

Thus, we can ensure that:

|v x y| \le m

and, because $G$ has no λ-productions or unit-productions, both $v$ and $y$ cannot be empty simultaneously:

|v y| \ge 1

Step 5: Conclusion

Hence, for every $\omega \in L$ with $|\omega| \ge m$ ,
we can decompose it as $\omega = u v x y z$ satisfying:

$|v x y| \le m$
$∣ v y ∣ \geq 1$
$u v^i x y^i z \in L$ , for all $i \geq 0$

This completes the proof.

🧠 Key Takeaways

Property	Explanation
Purpose	The pumping lemma for CFLs is used to prove that a language is not context-free.
Based on	Repetition of variables in long derivations (parse tree argument).
Limitations	It gives a necessary, but not sufficient, condition — i.e., some non-CFLs may still satisfy it.
Main idea	Strings in a CFL can be “pumped” in two places (v,y) while remaining valid.

Search This Blog

Theory Of Computation PCCST302 KTU Semester 3 BTech 2024 Scheme