Stability of Feedback Equilibrium Solutions for Noncooperative

Stability of Feedback Equilibrium Solutions
for Noncooperative Differential Games
Alberto Bressan
Department of Mathematics, Penn State University
Alberto Bressan (Penn State)
Noncooperative Games
1 / 33
Differential games: the PDE approach
The search for equilibrium solutions to noncooperative differential
games in feedback form leads to a nonlinear system of
Hamilton-Jacobi PDEs for the values functions
main focus: existence and stability of the solutions to these PDEs
Alberto Bressan (Penn State)
Noncooperative Games
2 / 33
Any system can be locally approximated by a linear one:
ẋ = f (x, u1 , u2 ),
ẋ = Ax + B1 u1 + B2 u2
Any cost functional can be locally approximated by a quadratic one.
Main issue: Assume that an equilibrium solution is found, to an
approximating game with linear dynamics and quadratic costs.
Does the original nonlinear game also have an equilibrium solution,
close to the linear-quadratic one?
Two cases
(i) finite time horizon
(ii) infinite time horizon
Alberto Bressan (Penn State)
Noncooperative Games
3 / 33
An example - approximating a Cauchy problem for a PDE
ut = uxx
u(0, x) = ϕ(x)
Approximate the initial data:
Conclude:
ϕ(x) ≈ ax 2 + bx + c
u(t, x) ≈ ax 2 + bx + c + 2at
(
Alberto Bressan (Penn State)
CORRECT
for t ≥ 0
WRONG
for t < 0
Noncooperative Games
4 / 33
Noncooperative differential games with finite horizon
Finite Horizon Games
x ∈ Rn
state of the system
u1 , u2
controls implemented by the players
Dynamics:
ẋ(t) = f (x, u1 , u2 )
x(τ ) = y
Goal of i-th player:
maximize:
.
Ji (τ, y , u1 , u2 ) = ψi x(T ) −
Z
T
Li x(t), u1 (t), u2 (t) dt
τ
= [terminal payoff] - [integral of a running cost]
Alberto Bressan (Penn State)
Noncooperative Games
5 / 33
Seek: Nash equilibrium solutions in feedback form
ui = ui∗ (t, x)
• Given the strategy u2 = u2∗ (t, x) adopted by the second player,
for every initial data (τ, y ), the assignment u1 = u1∗ (t, x) is a feedback solution to
optimal control problem for the first player :
Z
max
u1 (·)
!
T
ψ1 (x(T )) −
L1 (x, u1 , u2∗ (t, x)) dt
τ
subject to
ẋ = f (x, u1 , u2∗ (t, x)),
x(τ ) = y
• Similarly u2 = u2∗ (t, x) should provide a solution to the optimal control problem
for the second player, given that u1 = u1∗ (t, x)
Alberto Bressan (Penn State)
Noncooperative Games
6 / 33
The system of PDEs for the value functions
Vi (τ, y ) = value function for the i-th player
(= expected payoff, if game starts at τ, y )
Assume:

 f (x, u1 , u2 ) = f1 (x, u1 ) + f2 (x, u2 )

Li (x, u1 , u2 ) = Li1 (x, u1 ) + Li2 (x, u2 )
Optimal feedback controls:
ui∗ = ui∗ (t, x, ∇Vi ) = argmax
ω
∇Vi (t, x) · fi (x, ω) − Lii (x, ω)
The value functions satisfy a system of PDE’s
∂t Vi + ∇Vi · f (x, u1∗ , u2∗ ) = Li (x, u1∗ , u2∗ )
i = 1, 2
with terminal condition: Vi (T , x) = ψi (x)
Alberto Bressan (Penn State)
Noncooperative Games
7 / 33
Systems of Hamilton-Jacobi PDEs
Finite horizon game

 ∂t V1 = H (1) (x, ∇V1 , ∇V2 ) ,

∂t V2 = H (2) (x, ∇V1 , ∇V2 ) ,

 V1 (T , x) = ψ1 (x)

V2 (T , x) = ψ2 (x)
(backward Cauchy problem, with terminal conditions)
Alberto Bressan (Penn State)
Noncooperative Games
8 / 33
Test well-posedness: by locally linearizing of the equations
∂t Vi = H (i) (x, ∇V1 , ∇V2 )
perturbed solution:
(ε)
Vi
i = 1, 2
= Vi + ε Zi + o(ε)
Differentiating H (i) (x, p1 , p2 ), obtain a linear equation satisfied by Zi
∂t Zi =
∂H (i)
∂H (i)
(x, ∇V1 , ∇V2 ) · ∇Z1 +
(x, ∇V1 , ∇V2 ) · ∇Z2
∂p1
∂p2
Alberto Bressan (Penn State)
Noncooperative Games
9 / 33
Freezing the coefficients at a point (x̄, ∇V1 (x̄), ∇V2 (x̄)), one obtains a
linear system with constant coefficients
n
X
Z1
Z1
+
Aj
Z2 t
Z2 x
j=1
= 0
(1)
j
Each Aj is a 2 × 2 matrix
For a given vector ξ = (ξ1 , . . . , ξn ) ∈ Rn , consider the matrix
. X
ξj Aj
A(ξ) =
j
Definition 1. The system (1) is hyperbolic if there exists a constant C
such that
sup exp iA(ξ) ≤ C
ξ∈Rn
Alberto Bressan (Penn State)
Noncooperative Games
10 / 33
Computing solutions in terms of Fourier transform, which is
an isometry on L2 , the above definition is motivated by
Theorem 1. The system (1) is hyperbolic if and only if the corresponding
Cauchy problem is well posed in L2 (Rn ).
Z (t) 2
L
= b
Z (t)L2 ≤ sup exp(−iA(ξ)) · b
Z (0)L2
ξ∈Rn
= sup exp iA(ξ) · Z (0)L2
ξ∈Rn
Lemma 1 (necessary condition). If the system (1) is hyperbolic, then for every
ξ ∈ Rm the matrix A(ξ) has a basis of eigenvectors r1 , . . . , rn , with real
eigenvalues λ1 , . . . , λn (not necessarily distinct).
Lemma 2 (sufficient condition). Assume that, for |ξ| = 1, the matrices A(ξ)
can be diagonalized in terms of a real, invertible matrix R(ξ) continuously
depending on ξ. Then the system (1) is hyperbolic.
Alberto Bressan (Penn State)
Noncooperative Games
11 / 33
A class of differential games
payoffs:
dynamics:
Z
Ji = ψi (x(T )) −
T
ẋ = f1 (x, u1 ) + f2 (x, u2 )
Li1 (x, u1 ) + Li2 (x, u2 ) dt ,
i = 1, 2
0
Value functions satisfy

 ∂t V1 + ∇V1 · f (x, u1] , u2] )
= L1 (x, u1] , u2] )
 ∂ V + ∇V · f (x, u ] , u ] )
t 2
2
1 2
= L2 (x, u1] , u2] )
n
o
ui] = ui] (x, ∇Vi ) = argmax ∇Vi · fi (x, ω) − Lii (x, ω) ,
ω
V1 (T , x) = ψ1 (x) ,
Alberto Bressan (Penn State)
i = 1, 2
V2 (T , x) = ψ2 (x)
Noncooperative Games
12 / 33
∇Vi = pi = (pi1 , . . . pin )
f = (f1 , . . . , fn ),
Evolution of a perturbation:
Zi,t +
n
X
k=1
]
n X
2 X
∂uj
∂uj]
∂Li
∂f
fk Zi,xk +
−
Z1,xk +
Z2,xk = 0
∇Vi ·
∂uj
∂uj
∂p1k
∂p2k
k=1 j=1
Maximality conditions =⇒
∇V1 ·
∂f
∂L1
−
= 0,
∂u1
∂u1
Alberto Bressan (Penn State)
∇V2 ·
Noncooperative Games
∂f
∂L2
−
= 0
∂u2
∂u2
13 / 33
Evolution of a first order perturbation
Z1,t
Z2,t
+
n
X
Ak
k=1
Z1,xk
Z2,xk
=
0
0
where the 2 × 2 matrices Ak are given by

fk
Ak


= 
 
∇V2 ·
Alberto Bressan (Penn State)
∂f
∂L2
−
∂u1
∂u1
∂u1]
∂p1k
Noncooperative Games
∂f
∂L1
∇V1 ·
−
∂u2
∂u2
fk

∂u2]
∂p2k 




14 / 33

fk ξk
n 
X


A(ξ) =
 k=1 
∂f
∂L2 ∂u1]
∇V2 ·
−
ξk
∂u1
∂u1 ∂p1k
HYPERBOLICITY
=⇒
w = (w1 , . . . , wn ) ,
Alberto Bressan (Penn State)
∂L1
∂f
−
∂u2
∂u2
fk ξk

∂u2]
ξk 
∂p2k




A(ξ) has real eigenvalues for every ξ
∂f
∂L1 ∂u2]
∇V1 ·
−
∂u2
∂u2 ∂p2k
∂f
∂L2 ∂u1]
.
wk = ∇V2 ·
−
∂u1
∂u1 ∂p1k
.
vk =
v = (v1 , . . . , vn ) ,
HYPERBOLICITY
∇V1 ·
=⇒
(v · ξ)(w · ξ) ≥ 0
Noncooperative Games
for all ξ ∈ Rn
15 / 33
HYPERBOLICITY
=⇒
for all ξ ∈ Rn
(v · ξ)(w · ξ) ≥ 0
TRUE if v, w are linearly dependent, with same orientation.
FALSE if v, w are linearly independent.
w
ξ
w
v
ξ
v
Alberto Bressan (Penn State)
Noncooperative Games
16 / 33
In one space dimension, the Cauchy Problem can be well posed for a
large set of data.
In several space dimensions, generically the system is hyperbolic, and
the Cauchy Problem is ill posed
A.B., W.Shen, Small BV solutions of hyperbolic non-cooperative differential games,
SIAM J. Control Optim. 43 (2004), 194–215.
A.B., W.Shen, Semi-cooperative strategies for differential games, Intern. J. Game
Theory 32 (2004), 561–593.
A.B., Noncooperative differential games. Milan J. Math., 79 (2011), 357–427.
Alberto Bressan (Penn State)
Noncooperative Games
17 / 33
Differential games in infinite time horizon
Dynamics:
u1 , u2
ẋ = f (x, u1 , u2 ),
x(0) = x0
controls implemented by the players
Goal of i-th player:
maximize:
.
Ji =
+∞
Z
e −γt Ψi x(t), u1 (t), u2 (t) dt
0
(running payoff, exponentially discounted in time)
Alberto Bressan (Penn State)
Noncooperative Games
18 / 33
A special case
Dynamics: ẋ = f (x) + g1 (x)u1 + g2 (x)u2
Z
Player i seeks to minimize:
Ji =
∞
e
−γt
0
Alberto Bressan (Penn State)
Noncooperative Games
ui2 (t)
φi (x(t)) +
dt
2
19 / 33
A system of PDEs for the value functions
The value functions V1 , V2 for the two players satisfy the system of H-J
equations

 γV1 = (f · ∇V1 ) − 21 (g1 · ∇V1 )2 − (g2 · ∇V1 )(g2 · ∇V2 ) + φ1
 γV = (f · ∇V ) − 1 (g · ∇V )2 − (g · ∇V )(g · ∇V ) + φ
2
2
2
1
1
1
2
2
2 2
Optimal feedback controls:
ui∗ (x) = − ∇Vi (x) · gi (x)
i = 1, 2
nonlinear, implicit !
Alberto Bressan (Penn State)
Noncooperative Games
20 / 33
Linear - Quadratic games
Assume that the dynamics is linear:
ẋ = (Ax + b0 ) + b1 u1 + b2 u2 ,
x(0) = y
and the cost functions are quadratic:
Z
Ji =
0
+∞
u2 e −γt ai · x + x T Pi x + i dt
2
Then the system of PDEs has a special solution of the form
Vi (x) = ki + βi · x + x T Γi x
i = 1, 2
(∗)
optimal controls: ui∗ (x) = − (βi + 2x T Γi ) · bi
To find this solution, it suffices to
determine the coefficients ki , βi , Γi by solving a system of algebraic equations
Alberto Bressan (Penn State)
Noncooperative Games
21 / 33
Validity of linear-quadratic approximations ?
Assume the dynamics is almost linear
ẋ = f0 (x)+g1 (x)u1 +g2 (x)u2 ≈ (Ax +b0 )+b1 u1 +b2 u2 ,
x(0) = y
and the cost functions are almost quadratic
Z +∞
Z +∞
u2 ui2 −γt
dt ≈
e −γt ai · x + x T Pi x + i dt
Ji =
e
φi (x) +
2
2
0
0
Is it true that the nonlinear game has a feedback solution close to the
linear-quadratic game?
Alberto Bressan (Penn State)
Noncooperative Games
22 / 33
One-dimensional, linear-quadratic games in infinite time horizon
ẋ = (a0 x + b0 ) + b1 u1 + b2 u2
The ODE for the derivatives of the value functions ξi = Vi0 takes the form

A11

A21
 0


ξ1
ψ1 (x, ξ1 , ξ2 )
  = 
,
A22
ξ20
ψ2 (x, ξ1 , ξ2 )
A12
Aij = Aij (x, ξ1 , ξ2 )
The map (x, ξ1 , ξ2 ) 7→ det A(x, ξ1 , ξ2 ) is a homogeneous quadratic
polynomial
An affine solution exists: ξ1∗ (x) = k1 x + β1 ,
ξ2∗ (x) = k2 x + β2
The map x 7→ det A(x, ξ1∗ (x), ξ2∗ (x)) is a quadratic polynomial
Alberto Bressan (Penn State)
Noncooperative Games
23 / 33
Stability w.r.t. perturbations
(A.B. - Khai Nguyen, 2016)
on a bounded interval Ω = [a, b]
(positively invariant for the feedback dynamics)
on the whole real line
∗
ξ (x)
_
Γ
ξ∗(x)
∗
ξ (x)
x
x
Γ− =
Alberto Bressan (Penn State)
n
(x, ξ1 , ξ2 ) ;
x
o
det A(x, ξ1 , ξ2 ) ≤ 0
Noncooperative Games
24 / 33
Stability over a bounded interval
Easy case:
det A(x, ξ1∗ (x), ξ2∗ (x)) 6= 0 for all x ∈ R.


 0
ψ1 (x, ξ1 , ξ2 )
ξ1

  = A−1 (x, ξ1 , ξ2 ) 
ψ2 (x, ξ1 , ξ2 )
ξ20
The linear-quadratic game has a 2-parameter family of Nash equilibrium
solutions in feedback form. One is affine, the other are nonlinear.
All of the above solutions are stable w.r.t. small nonlinear perturbations of
the dynamics and the cost functions.
ξ
ξ*(x)
2
ξ
1
x
0
Alberto Bressan (Penn State)
Noncooperative Games
25 / 33
Case 2: det A vanishes at two points x̄1 < x̄2

A11

A21
  0


ξ1
ψ1 (x, ξ1 , ξ2 )
  = 

0
A22
ξ2
ψ2 (x, ξ1 , ξ2 )
A12
Equivalent system:
Setting:
A11 dξ1 + A12 dξ2 − ψ1 dx
A21 dξ1 + A22 dξ2 − ψ2 dx


−ψ1
. 
A11  ,
v =
A12
= 0
= 0


−ψ2
. 
A21  ,
w =
A22
We seek continuously differentiable functions x 7→ (ξ1 (x), ξ2 (x)) whose graph is
obtained by concatenating trajectories of the system

 

ẋ
A11 A22 − A12 A21
ξ˙1  = v × w =  A22 ψ1 − 12ψ2  .
A11 ψ2 − A21 ψ1
ξ˙2
Alberto Bressan (Penn State)
Noncooperative Games
26 / 33
det
A11
A21
A12
A22
=0
on
Σ 1 ∪ Σ2
Under generic conditions on the coefficients of the linear-quadratic problem, there
exists two curves γ1 ⊂ Σ1 , γ2 ⊂ Σ2 such that
v × w = 0 on γ1 and on γ2
v × w is vertical on Σ1 \ γ1 and on Σ2 \ γ2
Σ2
γ2
Σ1
ξ
1
ξ ∗(x)
γ1
ξ2
x
Alberto Bressan (Penn State)
Noncooperative Games
27 / 33
Three generic cases
one − parameter family of solutions
unique solution
γ2
γ
2
γ
γ
1
1
two − parameter family of solutions
γ2
γ
1
Alberto Bressan (Penn State)
Noncooperative Games
28 / 33
The saddle-saddle case
Σ2
ξ2
Σ1
M2
γ1
ξ*(x)
γ2
P2*
M1
P1*
ξ1
_
x1
_
x2
0
x
Under generic assumptions, a unique solution exists
Alberto Bressan (Penn State)
Noncooperative Games
29 / 33
Stability under perturbations, on the entire real line
(A.B., K.Nguyen, 2016)
ẋ = a0 x + f0 (x) + (b1 + h1 (x))u1 + (b2 + h2 (x))u2
Z +∞
u2
Ji =
e −γt Ri x + Si x 2 + ηi (x) + i dt
2
0
Under generic assumptions on the coefficients a0 , b1 , b2 , . . ., for any ε > 0 there
exists δ > 0 such that the following holds. If the perturbations satisfy
kf0 kC 2 + kη1 kC 2 + kη2 kC 2 + kh1 kC 1 + kh2 kC 1 ≤ δ,
then the perturbed equations for ξ1 = V10 , ξ2 = V20 admit a solution such that
|ξ1 (x) − ξ1∗ (x)| + |ξ2 (x) − ξ2∗ (x)| ≤ ε(1 + |x|)
Alberto Bressan (Penn State)
Noncooperative Games
for all x ∈ R
30 / 33
Future work: x ∈ R2
Vi = Vi (x1 , x2 )

 γV1
= (f · ∇V1 ) + 21 (g1 · ∇V1 )2 + (g2 · ∇V1 )(g2 · ∇V2 ) + φ1
γV2
= (f · ∇V2 ) + 12 (g2 · ∇V2 )2 + (g1 · ∇V1 )(g1 · ∇V2 ) + φ2

Linearize around the affine solution of a L-Q game
Determine if this linearized PDE is elliptic, hyperbolic, or mixed type
Construct solutions to the perturbed nonlinear PDE
x2
hyperbolic
elliptic
x1
Alberto Bressan (Penn State)
Noncooperative Games
31 / 33
1956
*
*
#
#
*
*
*
*
#
#
#
*
#
Happy Birthday
Jean Michel !!
Alberto Bressan (Penn State)
Noncooperative Games
32 / 33