[LAC] Context Free Grammars

We will now introduce context-free grammars (CFGs) as a formalism to define languages. CFGs are more general than regular expressions, i.e. there are more languages definable by CFGs.

Key idea: Rules, called productions, that describe how symbols called non-terminals(or variables, or syntactic categories) can be replaced by non-terminals and terminals until only terminals left.

so what is CFG?

A context-free grammar G = (N, T, P, S) is given by

  • A finite set N of nonterminal symbols or nonterminals
  • A finite set T of terminal symbols or terminals
  • N∩T = ∅, i.e., the sets N and T are disjoint
  • S ∈ N: the distinguished start symbol
  • A finite set P ⊆ N ×(N ∪T )∗ of productions. A production (A, α), where A ∈ N and α ∈ (N ∪ T )∗ is a sequence of nonterminal and terminal symbols, is written as A → α in the following.(P is what we called production before)

Non-terminal are also referred to as variables, and consequently the set of non-terminals is sometimes is sometimes denoted by V.
The terminals are the alphabet of the language defined by a context-free grammar , and for that reason the set of terminals is sometimes denoted by Σ.

The directly derives relation

enter image description here

enter image description here

if we have a “*” (star) on the transition arrow, this means there a several potential transition:
enter image description here

language generated by a grammar

enter image description here

Derivation tree

A tree is a derivation tree for a CFG G = (N,T,P,S) iff:
1. Every node has a label from N ∪T ∪{ε}.
2. The label of the root node is S.
3. Labels of interior nodes belong to N
4. If a node n has label A and nodes n1,n2, …, nk are children of n, from left to right, with labels X1, X2, …, Xk, respectively, then A->X1X2…Xk is a production in P.
5. If a node n has label ε, then n is a leaf and the only child of its parent.

The string of leaf labels read from left to right, eliding any ε, constitute the yield of the tree.

The yield of a parse tree

If we look at the leaves of any parse tree and concatenate them from the left, we get a string, called the yield of the tree, which is always a string that is derived from the root variable.
enter image description here

enter image description here

Ambiguity

We say that a grammar G is ambiguous if there is a word w ∈ L(G) for which there is more than one parse tree, or, equivalently, more than one leftmost derivation or more than one rightmost derivation. This is usually a bad thing because it entails that there is more than one way to interpret a word (i.e. it leads to semantical ambiguity).
enter image description here

Exercises

enter image description here
Answer: S -> 0S1 | 01

enter image description here
The set of all strings over alphabet a,b not of the form ww for any w.
No strings of odd length can be of the form ww . We use the terminal
symbols A and B to generate all odd-length strings where the center
characters are a and b, respectively.
S -> AB|BA|A|B
A -> aAa|aAb|bAa|bAb|a
B -> aBa|aBb|bBa|bBb|b

To understand why AB and BA are guaranteed to generate strings not of the form ww,
understand that A builds an odd-length string, say of length 2k + 1, where k characters
precede an a which preceds k more characters.
Similarly, B generates a string, say of length 2m + 1,
where m a’s and b’s precede one b preceds m more a’s and b’s. If we
concatenate one to another, we get a string of length 2k + 2m + 2:an even
length string-where the middle a of the portion generated by A is
k + m + 1 characters away from the central b generated by B.
That means that the first k + m + 1 characters cannot
be fully replicated in the final k + m + 1 characters.

[This answer taken from Jerry Cain at Stanford]

评论

此博客中的热门博文

[MLE] W2 Multivariate linear regression

[MLE] W1 Introduction

[AIM] MetaHeuristics