lecture17

LING/C SC/PSYC 438/538
Lecture 17
Sandiway Fong
Last Time
• Talked about:
– 1. Declarative (logical) reading of grammar rules
– 2. Prolog query: s(String,[]).
• Case 1. String is known: Is String ∈ L(G)?
• Case 2. String is unknown: Enumerate L(G)
– 3. Different search strategies
• Prolog's (left-to-right) depth-first search
• Iterative deepening
Beyond Regular Languages
• Beyond regular
languages
• That means no FSA, RE
or RG can be built for
this set
– anbn = {ab, aabb, aaabbb,
aaaabbbb, ... } n≥1
– is not a regular language
1. We only have a finite number of states to play with …
2. We’re only allowed simple free iteration (looping)
3. Pumping Lemma proof
Beyond Regular Languages
• Language
–
anbn
= {ab, aabb, aaabbb,
aaaabbbb, ... } n>=1
• Example:
Set
membership
• A regular grammar
extended to allow both
left and right recursive
rules can accept/generate
it:
1. a --> [a], b.
2. b --> [b].
3. b --> a, [b].
Set
enumeration
Beyond Regular Languages
• Language
– anbn = {ab, aabb, aaabbb,
aaaabbbb, ... } n>=1
• A regular grammar
extended to allow both
left and right recursive
rules can accept/generate
it:
1. a --> [a], b.
2. b --> [b].
3. b --> a, [b].
• Intuition:
– grammar implements
the stacking of partial
trees balanced for a’s
and b’s:
A
a
A
B
b
a
B
A
b
Beyond Regular Languages
• Language
– anbn = {ab, aabb, aaabbb,
aaaabbbb, ... } n>=1
• A regular grammar
extended to allow both
left and right recursive
rules can accept/generate
it:
1. a --> [a], b.
2. b --> [b].
3. b --> a, [b].
• A type-2 or context-free
grammar (CFG) has no
restrictions on what can
go on the RHS of a
grammar rule
• Note:
– CFGs still have a single
nonterminal limit for the
LHS of a rule
• Example:
1. s --> [a], [b].
2. s --> [a], s, [b].
Extra Argument: Parse Tree
• Recovering a parse tree
– when want Prolog to
return more than just
true/false answers
– in case of true, we can
compute a syntax tree
representation of the
parse
– by adding an extra
argument to
nonterminals
– applies to all grammar
rules (not just regular
grammars)
Example
• sheeptalk again
• DCG (non-regular, context-free):
s --> [b], [a], a, [!].
a --> [a].
(base case)
a --> [a], a. (recursive case)
s
b
a
a
a
!
a
a
Extra Argument: Parse Tree
• Tree:
s
b
a
a
a
!
a
• Prolog data structure:
– term
– hierarchical
– allows sequencing of
arguments
– functor(arg1,..,argn)
a
s(b,a,a(a,a(a)),!)
– each argi could be
another term or simple
atom
Extra Arguments: Parse Tree
• DCG
s
– s --> [b],[a], a, [!].
– a --> [a].
(base case)
– a --> [a], a.
(right recursive case)
•
•
base case
– a --> [a].
– a(subtree) --> [a].
– a(a(a)) --> [a].
b
a
a
a
!
a
a
s(b,a,a(a,a(a)),!)
recursive case
Idea: for each nonterminal,
– a --> [a], a.
add an argument to store its
– a(subtree) --> [a], a(subtree).
subtree
– a(a(a,A)) --> [a], a(A).
Extra Arguments: Parse Tree
• Prolog grammar
s
– s --> [b], [a], a, [!].
– a --> [a].
(base case)
– a --> [a], a.
(right recursive case)
•
b
a
base and recursive cases
– a(a(a)) --> [a].
– a(a(a,A)) --> [a], a(A).
a
a
!
a
a
s(b,a,a(a,a(a)),!)
•
start symbol case
– s --> [b], [a], a, [!].
– s(tree) --> [b], [a], a(subtree), [!].
– s(s(b,a,A,!) ) --> [b], [a], a(A), [!].
Extra Arguments: Parse Tree
• Prolog grammar
– s --> [b], [a], a, [!].
– a --> [a].
– a --> [a], a.
(base case)
(right recursive case)
• Equivalent Prolog grammar computing a parse
– s(s(b,a,A,!)) --> [b], [a], a(A), [!].
– a(a(a)) --> [a].
– a(a(a,A)) --> [a], a(A).
Extra Arguments
• Extra arguments are powerful
– they allow us to impose (grammatical) constraints
and change the expressive power of the system
• (if used as memory)
• Example:
– anbncn n>0 is not context-free (context-sensitive)
Extra arguments
• A context-free grammar (CFG) + extra
argument (EA) for the context-sensitive
language { anbncn | n>0}:
1.
2.
3.
4.
5.
6.
7.
s(s(A,A,A)) --> a(A), b(A), c(A).
a(a(a)) --> [a].
a(a(a,X)) --> [a], a(X).
b(a(a)) --> [b].
b(a(a,X)) --> [b], b(X).
c(a(a)) --> [c].
c(a(a,X)) --> [c], c(X).
Extra arguments
• A CFG+EA for anbncn n>0:
Set membership
question
Extra arguments
• A CFG+EA grammar for anbncn n>0:
Set enumeration
Another grammar for {anbncn|n>0}
• Use Prolog’s arithmetic predicates.
• { … } embeds Prolog code inside grammar rules
These are not nonterminal or terminal symbols.
Used in grammar rules, we must enclose these
statements within curly braces
16
Another Grammar for {anbncn|n>0}
• Explicit computation of the number of a’s using
arithmetic.
• { … } embeds Prolog code inside grammar rules
Another Grammar for {anbncn|n>0}
Parsing the a’s
Another Grammar for {anbncn|n>0}
• Computing the b’s
Another Grammar for {anbncn|n>0}
• Computing the c’s
Another grammar for {anbncn|n>0}
• Grammar is “correct” but not so efficient…
– consider string [a,a,b,b,b,b,b,b,b,c,c]
•
•
•
•
•
•
•
s --> a(X), b(X), c(X).
a(1) --> [a].
a(N) --> [a], a(M), {N is M+1}.
b(1) --> [b].
b(N) --> [b], b(M), {N is M+1}.
c(1) --> [c].
c(N) --> [c], c(M), {N is M+1}.
counts upwards
could
change to
count
down
A context-sensitive grammar for
{anbncn|n>0}
• Context-sensitive grammar has rules of the form LHS  RHS
– such that both LHS and RHS can be arbitrary strings of
terminals and non-terminals, and
– |RHS| ≥ |LHS|
• (exception: S  ε, S not in RHS)
• This is almost a normal Prolog DCG:
– (but rules 5 & 6 contain more than one non-terminal on the LHS):
1. s --> [a,b,c].
2. s --> [a],a,[b,c].
3. a --> [a,b], c.
4. a --> [a],a,[b],c.
5. c,[b] --> [b], c.
rules 5 and 6 are responsible
6. c,[c] --> [c,c].
for shuffling the c's to the end
A context-sensitive grammar for
{anbncn|n>0}
?- listing([s,a,c]).
1. s([a, b, c|A], A).
2. s([a|A], C) :- a(A, B), B=[b, c|C].
3. a([a, b|A], B) :- c(A, B).
4. a([a|A], D) :- a(A, B), B=[b|C], c(C, D).
5. c(A, C) :- A=[b|B], c(B, D), C=[b|D].
6. c([c, c|A], [c|A]).
1.
2.
3.
4.
5.
6.
s --> [a,b,c].
s --> [a],a,[b,c].
a --> [a,b], c.
a --> [a],a,[b],c.
c,[b] --> [b], c.
c,[c] --> [c,c].
A context-sensitive grammar for
{anbncn|n>0}
• [a,a,a,b,b,b,c,c,c]
1. s
2. [a],a,[b,c]
3. [a],[a],a,[b],c,[b,c]
4. [a],[a],[a,b],c,[b],c,[b,c]
5. [a],[a],[a,b],[b],c,c,[b,c]
6. [a],[a],[a,b],[b],c,[b],c,[c]
7. [a],[a],[a,b],[b],[b],c,c,[c]
8. [a],[a],[a,b],[b],[b],c,[c,c]
9. [a],[a],[a,b],[b],[b],[c,c,c]
10. [a,a,a,b,b,b,c,c,c]
1.
2.
3.
4.
5.
6.
s --> [a,b,c].
s --> [a],a,[b,c].
a --> [a,b], c.
a --> [a],a,[b],c.
c,[b] --> [b], c.
c,[c] --> [c,c].
A context-sensitive grammar for
{anbncn|n>0}
1.
1.
s([a,a,a,b,b,b,c,c,c],[])
2.
1. a([a,a,b,b,b,c,c,c],B)
3.
1. a([a,b,b,b,c,c,c],B)
4.
1. c([b,b,c,c,c],B)
5.
1. c([b,c,c,c],D)
6.
1. c([c,c,c],D)
2. => c([c,c,c],[c,c])
3. C=[b|[c,c]]
2. => c([b,c,c,c],[b,c,c])
3. C=[b|[b,c,c]]
2. => c([b,b,c,c,c],[b,b,c,c])
2. => a([a,b,b,b,c,c,c],[b,b,c,c])
3. [b,b,c,c]=[b|C]
(C=[b,c,c])
4. c([b,c,c],D)
1. c([c,c],D)
2. => c([c,c],[c])
3. C=[b|[c]]
5. => c([b,c,c],[b,c])
2. => a([a,a,b,b,b,c,c,c],[b,c])
3. [b,c]=[b,c|[]]
s([a, b, c|A], A).
s([a|A], C) :- a(A, B), B=[b, c|C].
a([a, b|A], B) :- c(A, B).
a([a|A], D) :- a(A, B), B=[b|C], c(C, D).
c(A, C) :- A=[b|B], c(B, D), C=[b|D].
c([c, c|A], [c|A]).
A context-sensitive grammar for
{anbncn|n>0}
1.
s([a,a,b,b,b,c,c,c],[])
1. a([a,b,b,b,c,c,c],B)
1. c([b,b,c,c,c],B)
1. c([b,c,c,c],D)
1. c([c,c,c],D)
2. => c([c,c,c],[c,c])
3. C=[b|[c,c]]
2. => c([b,c,c,c],[b,c,c])
3. C=[b|[b,c,c]]
2. => c([b,b,c,c,c],[b,b,c,c])
2. => a([a,b,b,b,c,c,c],[b,b,c,c])
3. [b,b,c,c]=[b,c|[]] FAIL
1.
2.
3.
4.
5.
6.
s([a, b, c|A], A).
s([a|A], C) :- a(A, B), B=[b, c|C].
a([a, b|A], B) :- c(A, B).
a([a|A], D) :- a(A, B), B=[b|C], c(C, D).
c(A, C) :- A=[b|B], c(B, D), C=[b|D].
c([c, c|A], [c|A]).
A context-sensitive grammar for
{anbncn|n>0}
1.
s([a,a,a,b,b,c,c,c],[])
1. a([a,a,b,b,c,c,c],B)
1. a([a,b,b,c,c,c],B)
1. c([b,c,c,c],B)
1. c([c,c,c],D)
2. => c([c,c,c],[c,c])
3. C=[b|[c,c]]
2. => c([b,c,c,c],[b,c,c])
2. => a([a,b,b,c,c,c],[b,c,c])
3. [b,c,c]=[b|C]
(C=[c,c])
4. c([c,c],D)
5. => c([c,c],[c])
2. => a([a,a,b,b,c,c,c],[c])
3.
[c]=[b,c|[]] FAIL
1.
2.
3.
4.
5.
6.
s([a, b, c|A], A).
s([a|A], C) :- a(A, B), B=[b, c|C].
a([a, b|A], B) :- c(A, B).
a([a|A], D) :- a(A, B), B=[b|C], c(C, D).
c(A, C) :- A=[b|B], c(B, D), C=[b|D].
c([c, c|A], [c|A]).
A context-sensitive grammar for
{anbncn|n>0}
1.
1.
s([a,a,a,b,b,b,c,c],[])
2.
1. a([a,a,b,b,b,c,c],B)
3.
4.
1. a([a,b,b,b,c,c],B)
5.
1. c([b,b,c,c],B)
6.
1. c([b,c,c],D)
1. c([c,c],D)
2. => c([c,c],[c])
3. C=[b|[c]]
2. => c([b,c,c],[b,c])
3. C=[b|[b,c]]
2. => c([b,b,c,c],[b,b,c])
2. => a([a,b,b,b,c,c],[b,b,c])
3. [b,b,c]=[b|C]
(C=[b,c])
4. c([b,c],D)
1. c([c],D) FAIL
s([a, b, c|A], A).
s([a|A], C) :- a(A, B), B=[b, c|C].
a([a, b|A], B) :- c(A, B).
a([a|A], D) :- a(A, B), B=[b|C], c(C, D).
c(A, C) :- A=[b|B], c(B, D), C=[b|D].
c([c, c|A], [c|A]).
Natural Language Parsing
• Syntax trees are a big deal in NLP
• Stanford Parser
– http://nlp.stanford.edu:8080/parser/index.jsp
– Uses probabilistic rules learnt from a Treebank
corpus
We do a lot with Treebanks in the follow-on course to this one (LING 581, Spring)
29

Download Report

lecture17

Paperzz.com

Your Paperzz