NEURONAL DYNAMICS 2: ACTIVATION MODELS (Part II)

NEURONAL DYNAMICS 2:
ACTIVATION MODELS
(Part II)
姓 名:覃桂敏
学 号:0622210211
Bivalent BAM theorem
The average signal energy L of the forward pass of theFX
Signal state vector S ( X ) through M,and the backward pass
Of the FY signal state vector S (Y ) through M T :
S ( X ) MS (Y )T  S (Y ) M T S ( X )T
L
2
since S (Y ) M T S ( X )T  [ S (Y ) M T S ( X )T ]T
 S ( X ) MS (Y )
L  S ( X )MS (Y )T
n

T
p
 S ( x )S ( y )m
i
i
i
j
j
ij
j
2
Lower bound of Lyapunov function
The signal energy Lyapunov function is clearly bounded
below.
For binary or bipolar,the matrix coefficients define the
attainable bound: L  
m

i
ij
j
The attainable upper bound is the negative of this expression.
3
Lyapunov function for the general BAM system
The signal-energy Lyapunov function for the general BAM
system takes the form
L  S ( X ) MS(Y )T  S ( X )[I  U ]T  S (Y )[J  V ]T
Inputs I  [ I1 , , I N ] and J  [ J 1 , , J P ] and
constant vectors of thresholds U  [U1 , , U N ] V  [V1 , , VN ]
the attainable bound of this function is.
L
 m  [ I
ij
i
j
i
i
 Ui ] 
[ J
j
V j ]
j
4
Bivalent BAM theorem
Bivalent BAM theorem: every matrix is bidirectionally stable
for synchronous or asynchronous state changes.
Proof: consider the signal state changes that occur from
time k to time k+1,define the vectors of signal state
changes as:
S ( X )  S ( X k 1 )  S ( X k )
 S1 ( x1 ), , S n ( xn ) ,
S (Y )  S (Yk 1 )  S (Yk )
 S1 ( y1 ), , S p ( y p ) ,
5
Bivalent BAM theorem
define the individual state changes as:
S i ( xi )  S i ( xi k 1 )  S i ( xi k )
S j ( y j )  S j ( y j k 1 )  S j ( y j k )
We assume at least one neuron changes state from k
to time k+1.
Any subset of neurons in a field can change state,but in
only one field at a time.
For binary threshold signal functions if a state change
is nonzero,
6
Bivalent BAM theorem
Si ( xi )  1  0  1 Si ( xi )  0  1  1
For bipolar threshold signal functions
Si ( xi )  2
Si ( xi )  2
The “energy”change
L  Lk 1  Lk
L
Differs from zero because of changes in field FX or in
field FY
7
Bivalent BAM theorem
L  S ( X ) MS(Yk )T  S ( X )[I  U ]T
 S ( X )[S (Yk ) M T  [ I  U ]]T

 S ( x ) I   S ( x )U
  S ( x ) S ( y ) m   S ( x ) I   S ( x )U
   S ( x )[  S ( y ) m  I  U ]
  Si ( xi )[ xik 1  U i ]

S i ( xi ) S j ( y kj ) T mij 
i
j
i
i
i
i
i
i
i
k T
i
i
i
ij
j
j
i
i
0
j
k T
i
j
j
i
i
i
i
i
i
i
ij
i
i
j
i
8
i
i
Bivalent BAM theorem
Suppose Si ( xi )  0
Then Si ( xi )  Si ( xi
k 1
)  Si ( xi k )
 1 0
k 1
This implies xi  U i so the product is positive:
Si ( xi )[xik 1  U i ]  0
Another case suppose S i ( xi )  0
Si ( xi )  Si ( xi
k 1
)  Si ( xi )
k
 0 1
9
Bivalent BAM theorem
k 1
This implies xi
 Ui
so the product is positive:
Si ( xi )[xik 1  U i ]  0
So Lk 1  Lk  0 for every state change.
Since L is bounded,L behaves as a Lyapunov function for
the additive BAM dynamical system defined by before.
Since the matrix M was arbitrary,every matrix is
bidirectionally stable. The bivalent BAM theorem is proved.
10
Property of globally stable dynamical system
11
Two insights about the rate of convergence
First,the individual energies decrease nontrivially. The
BAM system does not creep arbitrary slowly down the
Lyapunov or “energy” surface toward the nearest local
minimum. The system takes definite hops into the basin of
attraction of the fixed point.
Second,a synchronous BAM tends to converge faster
than an asynchronous BAM. In another word,
asynchronous updating should take more iterations to
converge.
12
Review
Neuronal Dynamical Systems
We describe the neuronal dynamical systems by firstorder differential or difference equations that govern
the time evolution of the neuronal activations or
membrane potentials.
x  g ( FX , FY ,)
y  h( FX , FY ,)
13
Review
Additive activation models
p
xi   Ai xi   S j ( y j )n ji  I i
j 1
n
y j   A j y j   Si ( xi )mij  J j
i 1
Hopfield circuit:
1. Additive autoassociative model;
2. Strictly increasing bounded signal function ( S   0) ;
3. Synaptic connection matrix is symmetric ( M  M T ).
xi
Ci xi     S j ( x j )m ji  I i
Ri
j
14
Review
Additive bivalent models
p
xik 1   S j ( y kj )m ji  I i
j
n
y kj 1   Si ( xik )mij  I j
i
Lyapunov Functions
Cannot find a lyapunov function,nothing follows;
Can find a lyapunov function,stability holds.
15
Review
L xi

L
i xi t
n
A dynamics system is
0
stable , if L
;
 0
asymptotically stable, if L
L

xi
i xi
n
.
Monotonicity of a lyapunov function is a sufficient
not necessary condition for stability and asymptotic
stability.
16
Review
Bivalent BAM theorem.
Every matrix is bidirectionally stable for synchronous or
asynchronous state changes.
•
Synchronous:update an entire field of neurons at a time.
•
Simple asynchronous:only one neuron makes a statechange decision.
•
Subset asynchronous:one subset of neurons per field
makes state-change decisions at a time.
17
Chapter 3. Neural Dynamics II:Activation Models
The most popular method for constructing M:the bipolar
Hebbian or outer-product learning method
binary vector associations: ( Ai , Bi )
bipolar vector associations: ( X i , Yi )
1
Ai  [ X i  1]
2
X i  2 Ai  1
i  1,2,  m
Chapter 3. Neural Dynamics II:Activation Models
The binary outer-product law:
m
M   AkT Bk
k
The bipolar outer-product law:
m
M   X kT Yk
k
The Boolean outer-product law:
m
M   AkT Bk
k
mij  max( a1i b1j ,  , a mi bmj )
Chapter 3. Neural Dynamics II:Activation Models
The weighted outer-product law:
m
M   wk X kT Y k
Where
k
m
w
k
 1 holds.
k
In matrix notation:
Where
M  X T WY
X T  [ X 1T | | X mT ]
Y T  [Y1T | | YmT ]
W  Diagonal [w1 ,, wm ]
Chapter 3. Neural Dynamics II:Activation Models
※3.6.1 Optimal Linear Associative Memory Matrices
Optimal linear associative memory matrices:
MX Y
*
The pseudo-inverse matrix of
X
:
X
*
XX * X  X
X XX  X
*
X X  (X X )
*
*
*
T
*
XX  ( XX )
*
* T
Chapter 3. Neural Dynamics II:Activation Models
※3.6.1 Optimal Linear Associative Memory Matrices
Optimal linear associative memory matrices:
The pseudo-inverse matrix of
If x is a nonzero scalar:
If x is a nonzero vector:
X
:
x*  1/ x
T
x
x*  T
xx
If x is a zero scalar or zero vector :
For a rectangular matrix
X
X , if
*
x*  0
( XX T ) 1exists:
X  X ( XX )
*
T
T
1
Chapter 3. Neural Dynamics II:Activation Models
※3.6.1 Optimal Linear Associative Memory Matrices
Define the matrix Euclidean norm M as
M  Trace( MM T )
Minimize the mean-squared error of forward
recall,to find M̂ that satisfies the relation
Y  XMˆ  Y  XM
for all M
Chapter 3. Neural Dynamics II:Activation Models
※3.6.1 Optimal Linear Associative Memory Matrices
1
X
Suppose further that the inverse matrix
exists.
Then
0 0
 Y Y
 Y - XX -1Y
ˆ  X 1Y
So the OLAM matrix M̂ correspond to M
Chapter 3. Neural Dynamics II:Activation Models
If the set of vector { X 1 ,, X m } is orthonormal
1 if
X i X Tj  
0 if

i j
i j
Then the OLAM matrix reduces to the classical linear
associative memory(LAM) :
T
ˆ
MX Y
For
X
is orthonormal, the inverse of
X
is
X
T
.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
Autoassociative OLAM systems behave as linear filters.
In the autoassociative case the OLAM matrix encodes only
the known signal vectors xi . Then the OLAM matrix
equation (3-78) reduces to
M  X *X
M linearly “filters” input measurement x to the output
vector
x  by vector matrix multiplication: xM  x  .
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
The OLAM matrix X * X behaves as a projection
operator[Sorenson,1980].Algebraically,this means
the matrix M is idempotent: M 2  M .
Since matrix multiplication is associative,pseudoinverse property (3-80) implies idempotency of the
autoassociative OLAM matrix M:
M 2  MM
 X * XX * X
 ( X * XX * ) X
 X*X
M
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
Then (3-80) also implies that the additive dual matrix
I  X * X behaves as a projection operator:
( I  X * X ) 2  ( I  X * X )( I  X * X )
 I 2 - X * X - X * X  X * XX * X
 I - 2X * X  ( X * XX * ) X
 I - 2X * X  X * X
 I - X*X
We can represent a projection matrix M as the
mapping
M : Rn  L
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
The Pythagorean theorem underlies projection
operators.
The known signal vectors X 1 ,  , X m span
n
some unique linear subspace L( X 1 , , X m ) of R
L equals {im ci X i : for all ci  R} , the set of all
linear combinations of the m known signal vectors.
L denotes the orthogonal complement space
{x  Rn : xy T  0 for all y  L}
,the set of all real n-vectors x orthogonal to every
n-vector y in L.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
n
1. Operator X * X projects R onto L.
2. The dual operator I  X * X projects R n onto L .
Projection Operator X * X and I  X * X uniquely
decompose every R n vector x into a summed signal
~
vector x̂ and a noise or novelty vector x :
x  xX * X  x ( I  X * X )
 xˆ  ~
x
x
~
x
x̂
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
The unique additive decomposition xˆ
generalized Pythagorean theorem:
|| x ||2 || xˆ ||2  || ~
x ||2
~
x
obeys a
2
2
2
where || x ||  x1    x n defines the squared
Euclidean or l 2 norm.
Kohonen[1988] calls I  X * X the novelty filter on R n .
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
Projection x̂ measures what we know about input x
relative to stored signal vectors X 1 ,  , X m :
m
x̂   c i x i
i
for some constant vector
~
x
( c1 ,  , c n )
.
The novelty vector
measures what is maximally
unknown or novel in the measured input signal x.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
Suppose we model a random measurement vector x as
a random signal vector x s corrupted by an additive,
independent random-noise vector x N :
x  xs  xN
We can estimate the unknown signal
*
filtered output xˆ  xX X .
x s as the OLAM-
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
Kohonen[1988] has shown that if the multivariable noise
distribution is radially symmetric, such as a multivariable
Gaussian distribution,then the OLAM capacity m and
pattern dimension n scale the variance of the randomvariable estimator-error norm || xˆ  x s || :
m
|| x  x s ||2
n
m
 || x N ||2
n
V [|| xˆ  x s ||] 
Chapter 3. Neural Dynamics II:Activation Models
※3.6.2 Autoassociative OLAM Filtering
1.The autoassociative OLAM filter suppress noise if m  n ,
when memory capacity does not exceed signal dimension.
2.The OLAM filter amplifies noise if m  n , when capacity
exceeds dimension.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
The above data-dependent encoding schemes add
outer-product correlation matrices.
The following example illustrates a complete nonlinear
feedback neural network in action,with data deliberately
encoded into the system dynamics.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
Suppose the data consists of two unweighted ( w1  w2  1)
binary associations ( A1 , B1 ) and ( A2 , B2 ) defined by the
nonorthogonal binary signal vectors:
A1   1 0 1 0 1 0 
B1   1 1 0 0 
A2   1 1 1 0 0 0 
B2   1 0 1 0 
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
These binary associations correspond to the two bipolar
associations ( X 1 , Y1 ) and ( X 2 , Y 2 ) defined by the bipol
–ar signal vectors:
X1  1 1 1 1 1 1 
Y1   1 1  1  1 
X 2  1 1 1 1 1 1 
Y2   1  1 1  1 
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
We compute the BAM memory matrix M by adding the bipol
T
–ar correlation matrices X 1 Y1 and X 2T Y2 pointwise. The first
T
correlation matrix X 1 Y1 equals
1 1 1 
 1
 1
 


  1
1 1 1 1 
 1
 1

1

1

1

X 1T Y1    1 1  1  1   
1 1
  1
1 1
 


1
1
1

1

1
 


  1
1 1

1
1
 


Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
T
Observe that the i th row of the correlation matrix X 1 Y1
equals the bipolar vector Y 1 multipled by the i th element
T
of X 1 . The j th column has the similar result. So X 2 Y2
equals
 1 1 1 1 


 1  1 1  1
 1  1 1  1

X 2T Y 2  
1 1 1 1 



1
1

1
1


1 1 1 1 


Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
Adding these matrices pairwise gives M:
M  X 1T Y1  X 2T Y2
1

1
1

1

1
1

 1  1  1
 
1 1 1   1
1  1  1  1

1 1 1  1
 
1  1  1   1
 1 1 1    1
1
1 1
 1  2 0 0  2 
 

 1 1  1  0  2 2 0 
 1 1  1  2 0 0  2 


1 1 1   2 0 0 2 
 

1 1 1   0 2  2 0 
1  1 1    2 0 0 2 
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
Suppose, first,we use binary state vectors.All update policies
are synchronous.Suppose we present binary vector A1 as
input to the system—as the current signal state vector at F X .
Then applying the threshold law (3-26) synchronously gives
A1 M  ( 4
2  2  4 )  (1
1 0 0 )  B1
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
T
Passing B1 through the backward filter M , and applying
the bipolar version of the threshold law(3-27),gives back A1 :
B1M T  ( 2  2 2  2 2  2 )  ( 1 0 1 0 1 0 )  A1
So ( A1 , B1 ) is a fixed point of the BAM dynamical system.
It has Lyapunov “energy” L( A1 , B1 )   A1 MB1T  6 ,
which equals the backward value  B1 M T A1T  6 .
( A2 , B2 ) has the similar result:a fixed point with
energy  A2 MB2T  6 .
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
So the two deliberately encoded fixed points reside in
equally “deep” attractors.
Hamming distance H equals l 1 distance. H ( Ai , A j ) counts the
number of slots in which binary vectors Ai and A j differ:
n
H ( Ai , A j )   | a ik  a kj |
k
Chapter 3. Neural Dynamics II:Activation Models
※3.6.3 BAM Correlation Encoding Example
Consider for example the input A  ( 0 1 1 0 0 0 ) ,
which differs from A2 by 1 bit , or H ( A, A2 )  1 . Then
A2   1 1 1 0 0 0 
AM  ( 2  2
2 2)( 1
0
1
0 )  B2
Fig3.2 shows that BAM can return original balance
regardless of the noise. bipolar
Chapter 3. Neural Dynamics II:Activation Models
※3.6.4 Memory Capacity:Dimensionality Limits Capacity
Synaptic connection matrices encode limited
information.
We sum more correlation matrices ,then mij  1
holds more frequently.
After a point,adding additional associations ( Ak , Bk )
Does not significantly change the connection
matrix. The system “forgets”some patterns.
This limits the memory capacity.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.4 Memory Capacity:Dimensionality Limits Capacity
Grossberg’s sparse coding theorem [1976] says , for
deterministic encoding ,that pattern dimensionality must
exceed pattern number to prevent learning some patterns
at the expense of forgetting others.
Chapter 3. Neural Dynamics II:Activation Models
※3.6.5 The Hopfield Model
The Hopfield model illustrates an autoassociative additive
bivalent BAM operated serially with simple asynchronous
state changes.
Autoassociativity means the network topology reduces to only
one field, F X ,of neurons: FX  FY .The synaptic connection
matrix M symmetrically intraconnects the n neurons in field
M  M T or mij  m ji .
Thanks for your attention!