Robust Dynamic Programming in N Players Uncertain Differential Games

Jiménez-Lizárraga, Manuel; Rodríguez-Sánchez, Sara V.; de la Cruz, Naín; Villarreal, César Emilio

doi:10.15388/20-INFOR436

Robust Dynamic Programming in N Players Uncertain Differential Games

Article type: Research Article

Authors: Jiménez-Lizárraga, Manuel^{1; ∗} | Rodríguez-Sánchez, Sara V.² | de la Cruz, Naín¹ | Villarreal, César Emilio²

Affiliations: [1] Facultad de Ciencias Físico Matemáticas, Universidad Autónoma de Nuevo León (UANL), San Nicolás de los Garza, Nuevo León, México | [2] Facultad de Ingeniería Mecánica y Eléctrica, Universidad Autónoma de Nuevo León (UANL), San Nicolás de los Garza, Nuevo León, México

Correspondence: [∗] Corresponding author.

Keywords: differential games, LQ games, dynamic supply chain

DOI: 10.15388/20-INFOR436

Journal: Informatica, vol. 31, no. 4, pp. 769-791, 2020

Received July 2019

Accepted November 2020

Published: 2020

Get PDF

Abstract

In this paper we consider a non-cooperative N players differential game affected by deterministic uncertainties. Sufficient conditions for the existence of a robust feedback Nash equilibrium are presented in a set of min-max forms of Hamilton–Jacobi–Bellman equations. Such conditions are then used to find the robust Nash controls for a linear affine quadratic game affected by a square integrable uncertainty, which is seen as a malicious fictitious player trying to maximize the cost function of each player. The approach allows us to find robust strategies in the solution of a group of coupled Riccati differential equation. The finite, as well as infinite, time horizon cases are solved for this last game. As an illustration of the approach, the problem of the coordination of a two-echelon supply chain with seasonal uncertain fluctuations in demand is developed.

1Introduction

Differential games stand as a suitable framework for modelling strategic interaction between different agents (known as players), where each of them is looking for the minimization or, equivalently, the maximization of his individual criterion (Engwerda, 2005; Başar and Olsder, 1999). In such a multi-player scenario, none of the players is allowed to maximize his profits or objectives at the expense of the rest of the players. Therefore, the solution of the game is given in a form of “equilibrium of forces”.

Among different types of solutions, the so called Nash equilibrium is the most extensively used in the game theory literature. In this solution none of the players can improve their criteria by unilaterally deviating from their Nash strategy; therefore, no player has an incentive to change his decision. When the full state information is available to all the players to realize their decision strategy in each point of time, this is called a feedback Nash equilibrium (Engwerda, 2005; Başar and Olsder, 1999; Friedman, 1971). In order to find such feedback strategies, the optimal control tools are applied, specifically an equivalent N players form of the Hamilton–Jacobi–Bellman equation is required to be solved for each of the players. In the case of the non-cooperative Nash equilibrium solution framework, each player deals with a single criterion optimization problem (the standard optimal control problem), with the actions of the remaining players taking fixed equilibrium values.

Although the notion of robustness is such an important feature in the control theory, there are not many studies of dynamic games that are affected by some sort of uncertainties or disturbances. Some recent developments on this topic can be mentioned. Jiménez-Lizárraga and Poznyak (2007) presented a notion of open loop Nash equilibrium (OLNE) where the parameters of the game are within a finite set and the solution is given in terms of the worst-case scenario, that is, the result of the application of certain control input (in terms of the cost function value) is associated with the worst or least favourable value of the unknown parameter. The article of Jank and Kun (2002) shows also an OLNE and derives conditions for the existence and uniqueness of a worst case Nash equilibrium (WCNE); however, in this case they considered that the uncertainty belongs to a Hilbert functional space and enters adding up into the time derivative of the state variables. A similar problem is considered in a quite recent work (Engwerda, 2017), where the author shows that the WCNE can be derived by finding an ONLE of an associated differential game with 2N initial state constraints, the author derives necessary and sufficient conditions for the solution of the finite time problem. The work of Jungers et al. (2008) deals with a game with polytopic uncertainties that reformulated the problem as a nonconvex coupling between semi-definite programming to find the Nash type controls. Other related approaches include: using the Nash strategy to design robust controls for linear systems (Chen and Zhou, 2001). Another way to deal with uncertainties is to view them as an exogenous input (a fictitious player) (Chen et al., 1997). In the work of van den Broek et al. (2003) the definition of equilibria is extended to deal with two cases: a soft-constrained formulation whose basis is given by Jank and Kun (2002), where the fictitious player is introduced in the criteria via a weighting matrix.

In this work, inspired in the works of Jank and Kun (2002) and Engwerda (2005, 2017), we analyse a deterministic N-player non-zero sum Differential Game case, considering finite, as well as infinite, time horizon in the performance index and a L2 perturbation which is considered as a fictitious player trying to maximize the cost of each i-th player.

Assuming the player has access to the full state information, we are interested in finding a type of robust feedback Nash strategies, that guarantee a robust equilibrium when the players consider the worst case of the perturbation with respect to their own point of view. To that end, a set of robust form of the HJB equations are introduced; each of these equations compute not only the minimum of the i-th player control; but the maximum or worst case uncertainty from his point of view; resulting in a min-max form of the known HJB equations for a N players game. To the best of the authors’ knowledge, using such a robust HJB equation has not been considered before to find a robust feedback Nash equilibrium in linear quadratic deterministic games, which stand as an important case to study. To summarize, the contributions of this work are as follows:

1. Presentation of the general conditions of robust worst case feedback Nash equilibrium by means of a robust form of the HJB equation for N players non-zero sum games.
2. Based on such a formulation, it gives the solution for the finite time horizon for the linear affine quadratic uncertain game.
3. It gives the solution of the infinite horizon for the linear affine dynamics.
4. It illustrates the result solving a problem of coordination of a two-echelon supply chain with seasonal uncertain fluctuations in demand. Such a case has not been treated before.

The development of this paper is as follows. In Section 2 we state, formally, the general problem of a differential game and the conditions for the Robust Nash Equilibrium to exist. Then, in Section 3 we define the dynamics of the problem analysed and the type of functional cost we have to minimize for a finite time horizon problem, we also state a theorem based on dynamic programming to find the robust controls for each player. In Section 4 we analyse the case of infinite time horizon. Finally, Section 5 follows with a numerical example. The purpose of this last section is to show how to apply the formulas obtained in Sections 3–4 and then compare our results against a finite time differential game which does not consider perturbation in the solution of the problem, which is the common problem treated, but the system itself is affected by some sort of perturbation.

2Problem Statement

In this section we exploit the principle of dynamic programming in order to find all the robust feedback Nash equilibrium strategies for each player of a Non-zero sum uncertain differential game. We begin by presenting the general sufficient conditions for such a robust equilibrium to exist. Towards that end, consider the following N-person uncertain differential game with initial pair (s,y)∈[0,T]×Rn×1 described by the following initial value problem

(1)

x˙(t)=f(x(t),(u1(t),u2(t),…,uN(t)),w(t),t),x(s)=y,a.e.t∈[s,T],T<+∞,

where x(t)∈Rn×1 is the state column vector of the game and ui(t)∈Rli×1 is the control strategy at time t for each player i that may run over a given control region Ui⊂Rli×1, i represents the number of player for i∈{1,…,N}, uıˆ is the vector of strategies for the rest of the players, ıˆ is the counter-coalition of players counteracting the player with index i and w(t)∈Rq×1 is a finite unknown disturbance in the sense that ∫0T‖w(t)‖2dt<+∞, that is, w is square integrable or, stated another way, w∈L2[0,T]. The cost function as individual aim performance

(2)

Ji(s,y,ui,uıˆ,w):=∫sTgi(x(t),ui(t),uıˆ(t),w(t),t)dt+hi(x(T)),

which contains the integral term as well as a terminal state is given in the standard Bolza form.

Throughout the article we shall use the next notations:

• AB is the set of functions from the set A to the set B.
• At is the transpose of the matrix A.
• IN,i:={k∈N:1⩽k⩽N and k≠i}.
• Uadmi[s0,s1]:={ui∈[s0,s1]Ui:ui is measurable}.
• Uadmi:=Uadmi[0,T] is the set of all admissible control strategies.
• Uadmıˆ:=Uadm1×⋯×Uadmi−1×Uadmi+1×⋯×UadmN.
• Uadm:=✕i=1NUadmi.
• If u∈Uadm, for t∈[0,T], u(t):=(u1(t),u2(t),…,uN(t)).
• Dif denotes the partial derivative of f with respect to the i-th component.
• 1A denotes the indicator function of a set A.

Hypothesis 1.

The control region Ui is a subset of Rli×1. The maps f, gi and hi are such that for all (ui,uıˆ,w)∈Uadmi×Uadmıˆ×L2[0,T], equation (1) admits an a.e. unique solution and the function Ji given in (2) is well defined; in general we assume the conditions given by Yong and Zhou (1999, p. 159).

Remark 1.

We assume that the integrand gi given in Equation (2) is positive definite, then the cost function Ji could not take negative values.

2.1Robust Feedback Nash Equilibrium

Next, we introduce the worst case uncertainty from the point of view of the i-th player according to the complete set of controls uj, with j∈{1,…,N} (Jank and Kun, 2002; Engwerda, 2017):

(3)

Ji(s,y,ui,uıˆ,wi,ui,uıˆ∗):=maxw∈L2[s,T]Ji(s,y,ui,uıˆ,w).

In this paper we want to extend the robust Nash equilibrium notion, previously introduced by Jank and Kun (2002) for an open loop information structure, to a full state feedback information for an N players game.

Definition 1.

The control strategies u1rn,u2rn,…,uNrn, are said to be robust feedback Nash equilibrium, where (uirn,)i=1N∈Uadm, if for any vector of admissible strategies

(ui,uıˆ)∈Uadmi×Uadmıˆ,fori∈{1,…,N},

assuming the existence of the corresponding maximizing uncertainty function wi,ui,uıˆrn∗∈L2[0,T] from the point of view of the i-th player. Then, we have the next set of inequalities:

(4)

Ji(s,y,uirn,uıˆrn,wi,uirn,uıˆrn∗)⩽Ji(s,y,ui,uıˆrn,wi,ui,uıˆrn∗).

In those conditions, we say also that (u1rn,u2rn,…,uNrn) is a vector of robust feedback Nash strategies for the whole set of players.

Hypothesis 2.

There is a unique vector of robust feedback Nash strategies for the whole set of players.

Now in order to find the robust feedback Nash equilibrium control strategies for the problem given by (2) subject to (1), we consider the following definition.

Definition 2.

Consider the N-tuples of strategies (u1,u2,…,uN) and the robust value function for the i-th player as:

(5)

Vi(s,y):=minui∈UiJi(s,y,ui,uıˆrn,wi,ui,uıˆrn∗),fori∈{1,2,…,N},Vi(T,x(T)):=hi(x(T)),

for any particular initial pair (s,y)∈[0,T)×Rn×1. The function Vi is also called the robust Bellman function.

Remark 2.

Notice that the minimization operation over ui considers that the rest of the players are fixed in their Robust strategies (4) and each wi,ui,uıˆ∗ satisfies (3).

2.2Robust Dynamic Programming Equation

Let us explore the Bellman principle of optimality (Poznyak, 2008) for the robust value function Vi associated with the min-max posed problem for the i-th player, considering the rest of the participants as well as the signal function w fixed.

For ui∈Uadmi, let us take vi=1[s,sˆ)ui+1[sˆ,T)uirn and note that vi∈Uadmi. Using the Bellman principle of optimality for the functional Ji(s,y,vi,uıˆrn,·), where Ji is given in equation (2), and using also equation (5) given in Definition 2 we have:

(6)

Vi(s,y)⩽Ji(s,y,vi,uıˆrn,wi,vi,uıˆrn∗)=maxw∈L2[s,T]Ji(s,y,vi,uıˆrn,w)=maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+∫sˆTgi(x(t),uirn(t),uıˆrn(t),w(t),t)dt+hi(x(T))}=maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))},

where the control strategies uıˆrn are robust Nash controls defined in (4) and x(sˆ) is such that x fulfills (1) when uj=ujrn for j≠i and w=wi,ui,uiˆrn∗ that is described in the Definition 1. Hence, taking the minimum in the right part of (6) over ui, the inequality yields to

(7)

rlVi(s,y)⩽minui∈Uadmi[s,T]maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))}.

On the other hand, beside for any δ>0, there is a control ui,δ∈Uadmi, with the property:

(8)

Vi(s,y)+δ⩾maxw∈L2[s,T]{∫ssˆgi(xδ(t),ui,δ(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,xδ(sˆ))},

where xδ is the solution of (1) under the application of the control ui,δ keeping the rest of the players fixed. Indeed, if there is a δ>0 such that for any ui∈Uadmi we have

Vi(s,y)+δ<maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))},

then, taking ui=uirn and using the Bellman principle of optimality, we would obtain

Vi(s,y)+δ<maxw∈L2[s,T]{∫ssˆgi(x(t),uirn(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))}⩽maxw∈L2[s,T]{∫ssˆgi(x(t),uirn(t),uıˆrn(t),w(t),t)dt+∫sˆTgi(x(t),uirn(t),uıˆrn(t),wi,uirn,uıˆrn∗(t),t)dt+hi(x(T))}=Vi(s,y),

arriving to a contradiction. So, from the inequality (8) we get

(9)

Vi(s,y)+δ⩾maxw∈L2[s,T]{∫ssˆgi(xδ(t),ui,δ(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,xδ(sˆ))}⩾minui∈Uadmi[s,T]maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))}.

Now, as in the inequality (9), the value of δ is positive, but arbitrary, we have

(10)

Vi(s,y)⩾minui∈Uadmi[s,T]maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))}

(Fattorini, 1999; Poznyak, 2008).

From the inequalities (7) and (10), we have arrived to the next theorem that is a robust form of the dynamic programming equation, for the problem in consideration.

Theorem 1.

Let the basic assumption of Section 2 hold, then for any initial pair (s,y)∈[0,T)×Rn×1, the following relationship holds:

(11)

Vi(s,y)=minui∈Uadmi[s,T]maxw∈L2[s,T]{∫ssˆgi(x(t),ui(t),uıˆrn(t),w(t),t)dt+Vi(sˆ,x(sˆ))},

for all sˆ∈[s,T].

The development of the principle of optimality to equation (11), leads immediately to the following result:

Theorem 2.

Let’s consider the uncertain affine N-players differential game given by (1)–(2), where T is finite and the full state information is known. In this case the vector of control strategies (uirn,uıˆrn) provides a robust feedback equilibrium if there exists a continuously differentiable function Vi:[0,T]×Rn×1→R satisfying the following partial differential equation:

(12)

−D1Vi(t,x(t))=minui∈Uadmimaxw∈L2[s,T]{D2Vi(t,x(t))tf(x(t),uˆi(t),w(t),t)−D1Vi(t,x(t))=+gi(x(t),ui(t),uıˆrn(t),w(t),t)};Vi(T,x(T))=hi(x(T)),fori∈{1,…,N},

where uiˆ(t)=(u1rn(t),…,ui−1rn(t),ui(t),ui+1rn(t),…,uNrn(t)), and the corresponding min-max cost for each player is

(13)

Ji∗=Vi(s,y).

Remark 3.

The partial differential equation (12) of Theorem 2 is called the robust Hamilton–Jacobi–Bellman (RHJB) equation. In previous important works dealing with the design of robust H∞ controllers using a dynamic game approach (Başar and Bernhard, 2008; Aliyu, 2011) the min-max version of the value function was already found. It was also found that actually when all the players are fixed in their robust Nash controls the game became a zero-sum game, played out between the i-th player and the uncertainty. Equation (12) is an extension to the N players non-zero sum game; however, to the best of our knowledge, this case has not been introduced yet.

3Finite Time Horizon N Players Linear Affine Quadratic Differential Game

Once the general conditions for the existence of a robust feedback Nash equilibrium in an uncertain differential game are established, we turn now to the special case of the linear affine quadratic differential games (LAQDG). In this section we consider the case where the time horizon is finite, that is, T<+∞. The game is played by N participants which are trying to minimize certain loss inflicted by a disturbance, besides, the functional cost of the game is restricted by the corresponding differential equation. Therefore, in this section we assume that:

(14)

f(x(t),u(t),w(t),t):=A(t)x(t)+∑j=1NBj(t)uj(t)+E(t)w(t)+c(t),x(t)∈Rn×1,x(0)=x0,uj(t)∈Rlj×1,0⩽t⩽T<+∞

the functions for the cost of each player are given by the following quadratic functions:

(15)

gi(x(t),ui(t),uıˆ(t),w(t),t):=x(t)tQi(t)x(t)+∑j=1Nuj(t)tRi,j(t)uj(t)−w(t)tWi(t)w(t),hi(x(T)):=x(T)tQifx(T),

where j represents the number of the player, A(t)∈Rn×n, and Bj(t)∈Rn×lj, for j∈{1,…,N}, are the known system and controls matrices; x(t) is the state vector of the game and uj is the control strategy for the j-th player; c(t)∈Rn×1 is an exogenous and known signal. In this case w is the same as in (1), that is, a finite disturbance entering the system through the matrix E(t)∈Rn×q. The performance index for each i-th player is given again in standard Bolza form, the strategy for the player i is ui while uıˆ are the strategies of the rest of the players. The term w(t)tWi(t)w(t) is the unknown uncertainty, which is trying to maximize the cost Ji from the point of view of the i-th player. The cost matrices are assumed to satisfy: Qi(t)=Qi(t)t⩾0, Qif=Qift⩾0 and Wi(t)=Wi(t)t>0 (symmetric and semipositive/positive definite matrices); Ri,i(t)=Ri,i(t)t>0 and Ri,j(t)=Ri,j(t)t⩾0, where inequalities mean inequalities component by component. Assume also that the players have access to the full state information pattern, that is, they measure x(t), for all t∈[0,T]. All the involved squared matrices are assumed to be non-singular.

For the linear affine dynamics given in (14), equation (12) can be rewritten as follows:

(16)

−D1Vi(t,x(t))=minui∈Uadmimaxw∈L2[0,T]{D2Vi(t,x(t))t(A(t)x(t)+Bi(t)ui(t)+∑j∈IN,iBj(t)ujrn(t)+E(t)w(t)+c(t))+x(t)tQi(t)x(t)+ui(t)tRi,i(t)ui(t)+∑j∈IN,iujrn(t)tRi,j(t)ujrn(t)−w(t)tWi(t)w(t)}

with terminal condition as Vi(T,x(T))=x(T)tQifx(T). With this condition and if the assumptions mentioned above are satisfied, the Robust Feedback Nash Equilibrium can be directly obtained as

(17)

uirn=argminui∈Uadmi{D2Vi(·,x(·))t(Ax+Biui+∑j∈IN,iBjuj+Ewi,ui,uıˆrn∗+c)+xtQix+uitRi,iui+∑j∈IN,iujrntRi,jujrn−wi,ui,uıˆrn∗tWiwi,ui,uıˆrn∗},

and worst case uncertainty from the point of view of the i-th player is obtained as

(18)

wi,ui,uıˆ∗=argmaxw∈L2[0,T]{D2Vi(·,x(·))t(Ax+∑j=1NBjuj+Ew+c)+xtQix+∑j=1NujtRi,juj−wtWiw}.

Remark 4.

Notice that the value of wi,ui,uıˆ∗ given in (18) does not depend of (ui,uıˆ). So, in this particular case, we shall denote such value just by wi∗.

Theorem 3.

The robust feedback Nash strategies for the uncertain LQ affine game (14)–(15), has the next linear form:

(19)

uirn=−Ri,i−1Bit(Pix+mi)

and the worst case uncertainty from the point of view of the i-th player is:

(20)

wi∗=Wi−1Eit(Pix+mi),

where the set of N Riccati type coupled equations Pi satisfy the following boundary value problem:

(21)

−P˙i=A˜tPi+PiA˜+Q−PiSitPi+PiMiPi+∑j∈IN,iPjSj,iPj;Pi(T)=Qif,A˜:=A−∑j=1NSjPj,

where Si:=BiRi,i−1Bit, Sj,i:=BjRj,j−1Ri,jRj,j−1Bjt, Mi:=EWi−1Et, for i,j∈{1,…,N}; and the mi are the “shifting vectors” governed by the following coupled linear differential equations:

(22)

−m˙i=Atmi−PiSimi−∑j∈IN,iPiSjmj−∑j∈IN,iPjSjmi+PiMimi+Pic+∑j∈IN,iPjSj,imj;mi(T)=0,

and the value of the robust Nash cost is:

(23)

Ji∗=x(0)tPix(0)+2mi(0)tx(0)+∫0T(mitSimi−2∑j∈IN,imitSjmj+mitMimi+∑j∈IN,imjtPjSj,iPjmj+2mitc),

where Ji∗ is the optimum value of (2).

The proof of this theorem is presented in Appendix A.

4Infinite Time Horizon Case

In this section we consider the same linear affine quadratic game when the time horizon is infinite. As the case analysed in the last section, the players are trying to minimize certain loss inflicted by a disturbance, besides, the functional cost of the game is restricted by a differential equation which considers an affine term. In this type of game the functional cost is given by:

(24)

Ji(t,x,ui,uıˆ,w)=∫0+∞(xtQix+∑j=1NujtRj,iuj−wtWiw),

and the constraint has the following form:

(25)

x˙(t)=Ax(t)+∑j=1NBjuj(t)+Ew(t)+c(t).

The involved matrices are constant with corresponding dimension and involved matrices in (24), satisfy equivalent restriction of the finite time counterpart. Following Engwerda (2005), we assume that c∈Lexp,loc2, that is, locally square integrable and converging to zero exponentially. In this case, the system of algebraic Riccati equations takes on the form:

(26)

A˜tPi+PiA˜+Q−PiSitPi+PiMiPi+∑j∈IN,iPjSj,iPj=0,

where A˜:=A−∑j=1NSjPj.

To find the solution to this problem the completion of the square method is developed in Poznyak (2008) and the following theorem is stated.

Theorem 4.

For the differential game problem given by the equations (24)–(25), if the algebraic Riccati equations (26) possess symmetric stabilizing solutions Pi, then the infinite time horizon Robust Nash Equilibrium strategies are given by

(27)

ui(t)=−Ri,i−1Bit(Pix(t)+mi(t)),

and the worst case will be given by

(28)

wi(t)=Ti−1Et(Pix(t)+mi(t)),

where each mi fulfills the equation

(29)

mi(t)=∫t+∞(e−(Ai−∑j≠iN(PjSj,i−PiSj))(t−s)Pic(s))ds,

and

Ai=At−∑j=1NPjSj+∑j=1NPjSj,i+PiMi.

Moreover, the optimal value Ji∗ is given by

(30)

Ji∗=x(0)tPix(0)+2mi(0)tx(0)+ni(0)

and the closed loop states equation has the form

(31)

x˙(t)=(A−∑j=1NBjRj,j−1BjtPj)x(t)−∑j=1NBjRj,j−1Bjtmj(t)+Ew(t)+c(t).

The proof of this theorem is found in Appendix A.

5Numerical Example: A Differential Game Model for a Vertical Marketing System with Demand Fluctuation and Seasonal Prices

Consider a noncooperative game in a two-echelon supply chain established between two chain agents (Dockner et al., 2000; Jørgensen, 1986); a single supplier (called the manufacturer) and a single distributor (called the retailer). The manufacturer is in charge of selling a product type to a single retailer over a period of time T at the price p1(t). The retailer is in charge of distributing and marketing that product, at a price p2(t)=p1(t)+r2(t), where r2(t) represents the profit margin gained by the retailer at time t per each unit sold. In this case, let us set r2=0.2p1.

The dynamic of the game is established by both players searching for a Nash equilibrium in their coordination contract, and furthermore facing some source of uncertainties. For this particular case assume that the retailer deals with a demand that evolves exogenously over time, with the quantity sold per time unit, d, depending not only on price p1, but also on the time t elapsed, d=d(p,t). The exogenous change in demand presented here is due to seasonal fluctuations. Under such enviroment, the profit equations for each players are J1 and J2 with the following quadratic structure:

(32)

J1=F1fx12(26)+∫026(c1(t)2u12(t)+h12x12(t))dt,

(33)

J2=F2fx22(26)+∫026(−p2w2(t)+(p1+c2(t)2)u22(t)+h22x22(t))dt,

subject to the following dynamic

(34)

x˙1=u1−u2,x˙2=u2−d−ew,

where J1 indicates the operating cost faced by the manufacturer given by the holding cost and the production cost, plus a small penalization of the inventories at the final time of the horizon. On the other hand, J2 indicates the operating cost incurred in by the retailer obtained by the holding cost, the production cost (including the price paid to the manufacturer for the products), and the perturbation signal w seen as a malicious fictitious player, and a small penalization of the inventories at the final time of the horizon. This game involves the dynamic changes of the inventory for each player (x1,x2), with the production rate (u1,u2) as decision variables. Moreover, the retailer’s dynamic faces an uncertain demand represented by two terms, the deterministic demand d plus the uncertain factor ew.

A=0000,B1=10,B2=−11,D=0−d,F1f=8000,F2f=0008,E=01,R11=c12,R22=p1+c22,Q1=h12000,Q2=000h22,

where

c1=0.85p1,p2=2p1,c2=2,h1=15,h2=10.5,

(35)

p1(t)=15(6t+90)for0⩽t⩽5,−3t+39for5<t⩽6,92t−6for6<t⩽8,−15t+150for8<t⩽9,203t−45for9<t⩽12,

(36)

d(t)=−5p1(t)+135for0⩽t<5,−12p1(t)+303for5<t⩽6,−269p1(t)+10059for6<t⩽8,−3715p1(t)+148515for8<t⩽9,−5p1(t)+135for9<t⩽12.

Fig. 1

Price vs Perturbed demand.

Fig. 2

Riccati differential equation player 1 (manufacturer).

Fig. 3

Riccati differential equation player 2 (retailer).

Fig. 4

Comparison between manufacturer produced units ( u1), units demanded by the retailer ( u2), and units left in the manufacturer stock x1.

According to the game equations (1) and (2), N=2. We used the Matlab software to solve numerically backward in time (21), thus obtaining the corresponding robust Nash equilibrium strategies for each player. The results of such numerical solution are shown in figures (Figs. 2, 3). In Fig. 1 are depicted the perturbed demand and the manufacturer price. Figure 4 shows the behaviour of decision variables from each player, and the state equation from the manufacturer ( u1 production manufacturer’s rate, u2 purchasing retailer’s rate, and manufacturer’s inventory). Through this figure we can compare the different outputs, for instance, we observe that products left in the manufacturer’s stock basically are close to zero. In fact, this figure shows the advantages of better coordination between the different chain agents, in order to reduce bullwip effect. Since, the manufacturer and the retailer share information about customer demand, the produced goods from the manufacturer and the purchased goods from the retailer are deriving in similar behaviour.

Also, since there are no restrictions for the states of a given stage in the chain we can see that, at times, we are going to have negative values for this variable. For example, between t=8 and t=16, the units left in stock get a negative value, this only means that the manufacturer has backlogged units. However, we can appreciate that the amount of this backlogged units is minimum. Also for the closing of the season, between t=20 and t=25 it is better for the manufacturer only to have backlogged units. Once we get a Nash equilibrium, any deviation from the output policies would result in a loss for the manufacturer or the retailer.

Fig. 5

Comparison between retailer bought units ( u2), units demanded by the final consumer (D), and units left in the retailer stock ( x2).

On the other hand, Fig. 5 shows the behaviour of the retailer’s dynamics through the time horizon. We can appreciate that the strategy followed by the retailer differs from the manufacturer in that the retailer uses inventory to face demand uncertainties. The retailer is considering the worst case of any perturbation on demand, but stock units are kept up to the minimum. The decisions at the end of the planning horizon are perturbed by the finite time horizon condition. For that reason the planning horizon was extended to two years in order to avoid such perturbations in the first year.

6Conclusions

We found the Nash equilibrium control function of an N players differential game affected by an L2 uncertainty function for a linear quadratic affine performance function in two cases:

1. When we have a finite time horizon. In this case we assume the matrices involved in the performance function are time dependent.
2. When we have an infinite time horizon. In this case we assume the matrices involved in the performance function are constant with respect to the time, and there are only temporal dependence of the uncertainty function, the state function, and an affine term of the constraint given by the linear differential equation.

Both problems are solved in this work using different methods.

Appendices

A

AAppendix

Proof.

Proof of Theorem 3. To start the proof we find uirn by means of (17) and the fact that

(37)

D2Vi(t,x(t))(Bi(t))+2ui(t)tRi,i(t)=0⇒ui(t)=−12Ri,i−1(t)Bi(t)t(D2Vi(t,x(t)))t,

and we find wi∗, given in Remark 4 by means of (18) and the fact that

(38)

D2Vi(t,x(t))(E(t))+2w(t)tW(t)=0⇒w(t)=12Wi−1(t)E(t)t(D2Vi(t,x))t.

Substituting (37) and (38) back in (16) we get

(39)

−D1Vi(·,x)=D2Vi(·,x)(Ax−12Ri,i−1Bit(D2Vi(·,x))t−12∑j∈IN,jBjRj,j−1Bjt(D2Vj(·,x))t−EWi−1Et(D2Vi(·,x))t+c)+xtQix+14∑j=1ND2Vj(·,x)BjRj,j−1Rj,iRj,j−1Bjt(D2Vj(·,x))t+12(D2Vi(·,x))EWi−1Et(D2Vi(·,x))t.

Now we have to solve the first order partial differential equation (39); to do that we propose the solution

(40)

Vi(t,x(t))=x(t)tPi(t)x(t)+2mi(t)tx(t)+ni(t).

(41)

D2Vi(t,x(t))=2x(t)tPi(t)+2mi(t)tand

(42)

D1Vi(t,x(t))=x(t)tP˙i(t)x(t)+2m˙i(t)tx(t)+n˙i(t).

Substituting (41) and (42) into (39), expanding, and grouping terms of the form xt(Y)x and xt(Z), yields

(43)

xt(P˙i+(A−∑j=1NBjRj,j−1BjtPj)tPi+Pi(A−∑j=1NBjRj,j−1BjtPj)+Q−PiBiRi,i−1BitPi+PiEWi−1EtPi+∑j=1NPjBjRj,j−1Rj,iRj,j−1BjtPj)x+xt(m˙i+Atmi−PiBiRi,i−1Bitmi−∑j=1NPiBjRj,j−1Bjtmj−∑j=1NPjBjRj,j−1Bjtmi+PiEWi−1Etmi+Pic+∑j=1NPjBjRj,j−1Rj,iRj,j−1Bjtmj)+(n˙i+mitBiRi,i−1Bitmi−2∑j=1NmitBjRj,j−1Bjtmj+mitEWi−1Etmi+∑j=1NmjtPjBjRj,j−1Rj,iRj,j−1BjtPjmj+2mitc)=0.

Last equation is only satisfied when (21) and (22) have solution; for the terminal conditions we have

Vi(T,x)=x(T)tPi(T)x(T)+2mi(T)tx(T)+ni(T)=x(T)tQifx(T),

implying the terminal conditions of (21) and (22). The value of the optimal functional cost is

(44)

Ji∗=x(0)tPi(0)x(0)+2mi(0)tx+ni(0),

so the theorem is proven. □

Proof.

Proof of Theorem 4. To develop the proof of this theorem let us suppose there exists an “energetic function” Vi:[0,+∞)×Rn×1→R with the form

Vi(t,x)=xitPixi+2mi(t)txi+ni(t),

where mi is defined in (29) and ni is given by

(45)

ni(t)=∫t∞(mitSimi−2∑j=1NmitSjmj+mitMimi+∑j=1NmjtPjSj,iPjmj+2mitc).

Let Vˆi:[0,+∞)→R the function given by

Vˆi(t):=Vi(t,x(t)).

We start by taking the derivative of Vˆi, obtaining

(46)

Vˆi′=2xtPix˙+2m˙itx+2mitx˙+n˙i.

Now the fundamental calculus theorem tells us

(47)

Vˆi(T)−Vˆi(0)=∫0TVˆi′(t)dt.

Substituting the value of x˙(t) given in (31) into (46) and so into (47) we get

(48)

Vˆi(T)−Vˆi(0)=∫0T(2x(t)tPi(Ax(t)+∑j=1NBjuj(t)+Ew(t)+c(t))+2m˙i(t)tx(t)+2mi(t)t(Ax(t)+∑j=1NBjuj(t)+Ew(t)+c(t))−n˙i(t))dt.

Now we start adding and subtracting xtQix, uitRi,iui, wtWiw, so we can express (48) as

(49)

Vˆi(T)−Vˆi(0)=∫0T((2x(t)tPiAx(t)+xtQix+ui(t)tRi,iui(t)+2x(t)tPi∑j=1NBjuj(t)+2x(t)tPic(t)+2m˙i(t)tx(t)+2mi(t)tAx(t)+2mi(t)t∑j=1NBjuj(t)+2mi(t)tc(t)+n˙i(t))+(2x(t)tPiEw(t)+2mi(t)tEw(t)−w(t)tWiw(t)))dt−∫0T(x(t)tQix(t)+ui(t)tRi,iui(t)−w(t)tWiw(t))dt.

The second brace can be expressed as a difference of squares as follows:

(50)

2(xtPiE+mitE)w−wtWiw=2(xtPiEW−12+mitEW−12)Wi12w−‖wtWi12‖2=−‖xtPiEW−12+mitEW−12−wtWi12‖2+‖xtPiEW−12+mitEW−12‖2.

Now, dealing with the first brace after the equal sign in the right side of (49), we do as follows:

(51)

2(xtPiBi+mitBi)ui+uitRi,iui=2(xtPiBiRi,i−12+mitBiRi,i−12)Ri,i12ui+‖uitRi,i12‖2=‖xtPiBiRi,i−12+mitBiRi,i−12+uitRi,i12‖2−‖xtPiBiRi,i−12+mitBiRi,i−12‖2.

Inserting (50) and (51) into (49), defining Si:=BiRi,i−1Bit and Mi:=EWi−1Et, we get

(52)

Vˆi(T)−Vˆi(0)=∫0T((xt(PiA+AtPi+Qi−PiSiPi+PiMimi)x+2xt(m˙i+Atmi−PiSimi+PiMimi+Pic)+(n˙+mitMimi−mitSimi+2mitc)+2xtPi∑j∈IN,iBjuj+2mit∑j∈IN,iBjuj)+‖xtPiBiRi,i−12+mitBiRi,i−12+uitRi,i12‖2−‖xtPiEW−12+mitEW−12−wtW12‖2)−∫0T(xtQix+uitRi,iui−wtWiw).

In (52) since uj is minimizing at the same time ui does, then we know that the form of the control for uj is

(53)

uj=−Rj,j−1Bt(Pjx+mj).

Substitution of (53) into (52) gives

(54)

Vˆi(T)−Vˆi(0)=∫0T((xt(Pi(A−∑j∈IN,iSjPj)+(A−∑j∈IN,iSjPj)tPi+Qi−PiSiPi+PiMimi+∑j∈IN,iPjSj,iPj)x+2xt(m˙i+Atmi−PiSimi+PiMimi+∑j∈IN,iPjSj,imj−∑j∈IN,iPjSjmi−∑j∈IN,iPiSjmj+Pic)+(n˙+mitMimi−mitSimi+∑j∈IN,imjtSj,imj−2mit∑j∈IN,iSjmj+2mitc)+‖xtPiBiRi,i−12+mitBiRi,i−12+uitRi,i12‖2)−‖xtPiEWi−12+mitEWi−12−wtWi12‖2−‖xt∑j∈IN,iPjBjRj,j−1Rj,i12+∑j∈IN,imjtBjRj,j−1Rj,i12+ujtRj,i12‖2)−∫0T(xtQix+uitRi,iui+ujtRj,iuj−wtWiw).

According to (54) we have to find the solution to the N equation. This solution is found by solving simultaneously the differential equations system

m˙1m˙2⋮m˙N=A1P2S21−P1S2⋯PNSN1−P1SNP1S12−P2S1A2⋯PNSN2−P2SN⋮⋮⋱⋮P1S1N−PNS1P2S2N−PNS2⋯ANm1m2⋮mN+P1cP2c⋮PNc,

where the result for mi are the N equation stated in (29) of Theorem 4.

From the assumptions of equations (24)–(30), this is Pi, mi, ni are solutions to the equations (26), (29), (45) respectively, and substituting these equations into (54) we find that

Vˆi(T)−Vˆi(0)=∫0T((‖xtPiBiRi,i−12+mitBiRi,i−12+uitRi,i12‖2)−‖xtPiEWi−12+mitEWi−12−wtWi12‖2−‖xt∑j∈IN,iPjBjRj,j−1Rj,i12+∑j∈IN,imjtBjRj,j−1Rj,i12+ujtRj,i12‖2)−∫0T(xtQix+uitRi,iui+ujtRj,iuj−wtWiw).

Now, since mi converges exponentially to zero, because the matrix Ai−∑j=1N(PjSj,i−PiSj) is stable, then ni also converges, because it depends on the mi terms and c, which is a locally square integrable function that converges to zero exponentially. Also since x(T)→0 as T→+∞, then limT→+∞Vˆi(T)=0, obtaining

(55)

−Vˆi(0)=∫0+∞(‖xtPiBiRi,i−12+mitBiRi,i−12+uitRi,i12‖2−‖xTPiEWi−12+miTEWi−12−wTWi12‖2−‖xt∑j∈IN,iPjBjRj,j−1Rj,i12+∑j∈IN,imjtBjRj,j−1Rj,i12+ujtRj,i12‖2)−Ji,

where

Ji=∫0+∞(xtQix+uitRi,iui+ujtRj,iuj−wtWiw).

This means that the control that minimizes Ji has the form of equation (27) and the term that maximizes Ji has the form of equation (28). In that case, when we substitute ui and wi in (55) we find that the optimal control has the value given in (30). □

Acknowledgements

The authors appreciate the used suggestions made by the anonymous referee that have helped to improve the quality of the article.

References

1	Aliyu, M.D.S. ((2011) ). Nonlinear ℋ_∞ Control, Hamiltonian sYSTEMS and Hamiltonian–Jacobi Equations. CRC Press, Boca Raton Fl.
2	Başar, T., Bernhard, P. ((2008) ). H^∞-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach, 2nd ed. Birkhäuser.
3	Başar, T., Olsder, G.J. ((1999) ). Dynamic Noncooperative Game Theory 2nd ed. Classics in Applied Mathematics, Vol. 23: . SIAM.
4	Chen, H., Scherer, C.W., Allgöwer, F. ((1997) ). A game theoretic approach to nonlinear robust receding horizon control of constrained systems. In: Proceedings of the American Control Conference, Vol. 5: . American Automatic Control Council, pp. 3073–3077.
5	Chen, X., Zhou, K. ((2001) ). Multiobjective H2/H∞ control design. SIAM Journal on Control and Optimization, 40: (2), 628–660.
6	Dockner, E.J., Jørgensen, S., Long, N.V., Sorger, G. ((2000) ). Differential Games in Economics and Management Science. Cambridge University Press.
7	Engwerda, J. ((2005) ). LQ Dynamic Optimization and Differential Games. Wiley.
8	Engwerda, J.C. ((2017) ). Robust open-loop Nash equilibria in the noncooperative LQ game revisited. Optimal Control Applications & Methods, 38: (5), 795–813.
9	Fattorini, H.O. ((1999) ). Infinite-Dimensional Optimization and Control Theory. Encyclopedia of Mathematics and Its Applications, Vol. 62: . Cambridge University Press.
10	Friedman, A. ((1971) ). Differential Games. Pure and Applied Mathematics, Vol. XXV: . Wiley.
11	Jank, G., Kun, G. ((2002) ). Optimal control of disturbed linear-quadratic differential games. European Journal of Control, 8: (2), 152–162.
12	Jiménez-Lizárraga, M., Poznyak, A. ((2007) ). Robust Nash equilibrium in multi-model LQ differential games: analysis and extraproximal numerical procedure. Optimal Control Applications and Methods, 28: (2), 117–141.
13	Jørgensen, S. ((1986) ). Optimal production, purchasing and pricing: a differential game approach. European Journal of Operational Research, 24: (1), 64–76.
14	Jungers, M., Castelan, E.B., de Pieri, E.R., Abou-Kandil, H. ((2008) ). Bounded Nash type controls for uncertain linear systems. Automatica. A Journal of IFAC, 44: (7), 1874–1879.
15	Poznyak, A.S. ((2008) ). Advanced Mathematical Tools for Automatic Control Engineers, Vol. 1: Deterministic Techniques. Elsevier.
16	van den Broek, W.A., Engwerda, J.C., Schumacher, J.M. ((2003) ). Robust equilibria in indefinite linear-quadratic differential games. Journal of Optimization Theory and Applications, 119: (3), 565–595.
17	Yong, J., Zhou, X.Y. ((1999) ). Stochastic Controls: Hamiltonian Systems and HJB Equations. Applications of Mathematics, Stochastic Modelling and Applied Probability, Vol. 43: . Springer.