NON-LINEAR UTILITY FUNCTIONS IN MNL DISCRETE

NON-LINEAR UTILITY FUNCTIONS IN MNL DISCRETE-CHOICE MODELS
Ch.D.R. Lindveld
Faculty of Civil Engineering and Geosciences, Delft University of Technology
1 INTRODUCTION
The standard practice in modelling route choice and mode choice is to use
Random-Utility models (e.g. logit or probit) with utility functions that are linear in
parameters and often linear functions of the explanatory variables. However,
there are reasons to suspect that non-linearities exist in the valuation of attributes
in route-choice behaviour.
The literature contains a number of instances where non-linearities have been
incorporated in discrete-choice models in a pragmatic way, and more recently
gives a plausible theoretical underpinning for non-linearities.
Gaudry and Wills (1978) present an investigation into the functional form of travel
demand models, both from a cross-sectional and time-series point of view, and
introduce the use of location parameters and the Box-Cox transform. A report for
the US federal Transit Authority [CTPS (1997)] reports that after careful
examination, the weight of a 1 mile of walking distance in public transport (PT)
access and egress varies significantly with the distance. Mandel et. al. (1993)
propose a non-linear logit model for mode-choice that relies on applying a BoxCox transformation to the explanatory variables. One of the arguments put forth
is that the marginal disutility of an extra mile of distance should be expected to
decrease with distance.
Blajac et al. (1998), (2001) show how non-linear specification of the indirect utility
function can be derived from a utility maximisation programme. Eliasson (2000)
argues that the derivative of trip disutility w.r.t. time must contain a component
due to the direct utility function in addition to the component of the indirect utility
function, which leads to a “quality adjustment” of the value of time depending on
how and where time is spent.
This raises the question of whether adding threshold effects and non-linear
transformations to conventional logit models can be shown to improve the
models, and which method seems best. To investigate this we will estimate a
number of logit models with non-linear utility functions on a route-choice dataset
collected by earlier researchers in The Netherlands, and show a practical method
to estimate a logit model with piece-wise linear utility functions.
The results tend to show that the piecewise linear formulation gives reasonable
results and is easy to use, but that the use of the Box-Cox transform must be
used with care, especially if it is applied to more than one variable
simultaneously.
© Association for European Transport 2001
1
1.1 Incorporating non-linear explanatory variables in utility functions
We will use the linear-in-parameters MultiNomial Logit model (MNL), and add
non-linear transformations of the explanatory variables. The probability that
individual n chooses alternative i over alternatives j (part of the choice set of
individual n) is:
(
= P (V
) (
> ε ) ∀j
)
Pin = P U in > U jn = P Vin + ε in > V jn + ε jn ∀j
in
− V jn
(1)
n
Denoting the data matrix by X , the vector of model parameters by β , and the
systematic part of the utility function for alternative i of individual n as:
K
Vin (X, β ) = ∑ β k X kin
(2)
k =1
Denoting the choice set of individual n by Cn , the LMNL choice probabilities are
Pin (X, β ) =
eVin ( X ,β )
eVin ( X ,β )
(3)
∑
i∈Cn
The log-likelihood of observing dataset X given are the vector β contains the true
LMNL parameters is:
L(X β ) =
∑ ln(P (X, β ))
N
i *n
n =1
where i * denotes the chosen alternative. The vector of parameters β of the
LMNL is usually determined through maximum likelihood estimation:
βˆ = max L(X β )
(4)
β
The Box-Cox transform
In [Gaudry and Wills (1978)], the Box-Cox transform is presented as a way of
incorporating non-linearity into utility function of a logit model. The Box-Cox
transform of explanatory variables X kin is defined as:
X
λki
kin
(
)
λki
 X kin
−1

=  λkj
 ln X
kin

(
)
λki ≠ 0
X kin > 0
(5)
λki = 0
The idea of applying this transform in a logit model context is presented in
[Mandel et. al. (1993)]. With the Box-Cox transform, the systematic part of the
utility function becomes:
K
λki
,
Vin = ∑ β ki X kin
(6)
k =1
© Association for European Transport 2001
2
Note that for λ ≠ 1 the variance of the explanatory variables varies with their
magnitude.
A formulation of the piecewise linear function
The piece-wise linear formulation is presented as a way of incorporating nonlinearity in [Ben-Akiva and Lerman (1989)]. A formulation of the piece-wise linear
function is the one shown in Figure 1; its functional form is as follows:
0 ≤ x < α1
β1 x


v( x ) = 
x1 ≤ x < α 2
(7)
β1 x + β 2 (x − α1 )
β x + β (x − α ) + β (x − α )
x ≥ α2
2
1
3
2
 1
Note that this form guarantees continuity of v(x ) in α 1 and α 2 . Using a suitable
indicator function, such as the function dim offered in the Alogit estimation
package (see [HCG (1995)]), which is defined as:
x <α
 0
dim( x, α ) = 
(8)
x − α x ≥ α
Equation (7) can be written as a linear combination of non-linear functions of x:
v( x ) = β1 x + β 2 dim( x, α1 ) + β 3 dim( x, α 2 )
(9)
This functional form is continuous (as dim(α,α) = 0), it is linear in the model
parameters β , (it is only non-linear in the parameters α ), so that it can be
estimated using standard MNL estimation packages.
v( x )
β3
β1 x + β 2 ( x − α1 ) + β 3 ( x − α 2 )
β1 x + β 2 ( x − α1 )
β2
β1 x
α1
β1
α2
x
Figure 1: Continuous piecewise linear utility functions
© Association for European Transport 2001
3
1.2 Model specification and parameter estimation
Having transformed certain explanatory variables in the systematic part of the
utility function, we now have an MNL with a non-linear log-likelihood function.
Using a maximum-likelihood approach to estimate the non-linear models, our
model parameters can be determined as follows:
(αˆ , βˆ , λˆ ) = max L(X α, β, λ )
(10)
α,β, λ
To be useful for practitioners, we wanted to re-use existing software as much as
possible. So instead of writing a dedicated solver for this problem, we have split
(10) into two sub-problems one of which can be solved with standard MNL
estimation software, while the other can be attacked with modest means. We
note that (10) can be interpreted as:
(αˆ , βˆ , λˆ ) = max L(X α, β, λ ) = max max L(X, α, λ β) = max L (X, β α, λ )
*
α,β, λ
α, λ
β
α, λ
(11)
where L* (X, β α, λ ) is known as a concentrated likelihood function (see [Amemiya
(1985), §4.2.5], where it is shown to be a valid way of estimating the parameters
given certain regularity conditions). We will refer to the maximisation problem of
finding the combination of α and λ for which the best MNL model can be
estimated:
(αˆ , λˆ ) = max L (X, β α, λ )
*
α, λ
(12)
as problem A, and to the minimisation problem
max L (X, α, λ β )
β
(13)
which is standard MNL estimation since α and λ are fixed, as problem B.
To solve problem B we have to the estimate the parameters β of a MNL; the loglikelihood function for this problem is concave so that any local maximum found
by the Newton minimisation procedure is also the global maximum. A standard
logit estimation package (Alogit) was used to solve Problem B.
Finding the global maximum corresponding to problem A
The problem is to find the global maximum corresponding to problem A. In
contrast to problem B, the function, L* in problem A will usually have multiple
local maxima (especially with respect to the parameter α), so that local
optimisation routines such as Newton or Quasi-Newton methods may fail to find
the global optimum.
In our formulation the likelihood function L is continuous but non-differentiable in
α, although this could be addressed by smoothing the transition around the
inflexion points. We have noted that standard minimisation algorithms (that rely
on derivative information) break down on this problem, and we suspect that this
is caused by the non-differentiability of L. In addition we can no longer rely on
© Association for European Transport 2001
4
local maximisation methods to find the global optimum, or even to come up with
sensible solutions, so that finding the optimal values of α and λ requires global
optimisation.
Pinter (1996) lists a variety of techniques for finding global optima, but there
seems to be no single “best” method. We see stochastic optimisation methods
(which give a certain probability of finding the optimum with a limited amount of
effort) and deterministic optimisation methods (which are guaranteed to find the
optimum, usually with high computational effort). The former include simulated
annealing, random-search and genetic programming, while the latter consist of
branch-and-bound methods and interval methods.
Brute force - gridsearch
The brute-force approach to problem A is to do a grid-search, i.e. to evaluate
problem A on a grid of values for vectors α and λ. This approach is feasible when
the dimension of α and λ is low (1 or 2), but quickly becomes infeasible as the
dimension increases.
The number of function evaluations can be reduced by sampling our test points;
this is known as the Random Search (“RS”) method. The disadvantage is that the
points may cluster. As there is no need for the sequence to be random, this
problem can be addressed by using a “low-discrepancy sequence”, which
produces an evenly spaced set of points in all dimensions. We have used the
Sobol sequence to generate the points (see [Press et al. (1995)]). Another option
would have been to use the Halton sequence. For our low-dimensional our
problems (n < 5) we did not find significant differences between the two.
Random search – local optimisation
An often-used approach is to combine local optimisation with a global strategy for
picking starting points. Assuming that the global optimum has a “basin of
attraction”, i.e. a region around it in which a local minimisation routine would
converge towards it, we would have to ensure that one of our starting points hits
the basin to find the global optimum.
In the RS-LOPT (Random Search – Local OPTimisation) method starting points
are sampled, and a local minimisation routine is started at each point. The
probability that the RS-LOPT approach will find the optimum is the probability that
the basin of local convergence is hit by at least one sample point. Unfortunately
the diameter of this basin of attraction is not known in advance. As local
minimisation routines we used the Nelder-Mead simplex method and the MCS
method (see below).
Dedicated global minimisation routines
Of the available routines for global optimisation, we have selected the Multilevel
Co-ordinate Search (MCS) routine (described in [Huyer and Neumaier (1999)])
© Association for European Transport 2001
5
which reports that it compares favourably to standard algorithms for global
optimisation on a standard set of test problems.
Software set-up
The software set-up used to solve the combined estimation problem is shown in
Figure 2. To solve the MNL estimation problem (problem B), we used ALOGIT
(see [HCG (1995)]). The solver for problem A produces estimates for α and λ,
which are hard-coded in an Alogit control file. Next Alogit is started to solve
problem B (estimation of the resulting MNL model), and the results (the loglikelihood LL and the MNL parameters β) are output to file. The solver for
problem A reads this file and revises its estimates for α and λ. The process stops
when the solver for problem A can no longer improve its solution.
Global optimisation
program
Alogit template control file
λ, α parameters
Control file generation (Perl script)
Alogit control file
Data
Alogit
LL, β parameters
Figure 2: Software set-up to solve the combined estimation problem
2 THE CASE STUDY
MNL models with non-linearities have been applied to an RP survey dataset
collected by Van der Waard on route- and mode choice in Public Transport
networks in The Netherlands. A detailed description of this dataset and the
choice situation can be found in [Van der Waard (1988)], but will be briefly
summarised here.
The Van der Waard study aimed at determining the relative weights of the time
spent in various elements of a Public Transport trip (access, waiting, in-vehicle,
transfer (if any), egress).
Respondents were asked for the starting and destination address of their trip,
means of travel used to reach the stop at which they were interviewed, means of
© Association for European Transport 2001
6
transport they would use to continue their trip, and the transport alternatives they
could list. In addition a number of segmentation questions were asked.
In the design of the study care was taken to ensure that:
•
the respondents had a viable choice between route alternatives
•
the choice set was simple enough for complete data to be gathered on all
alternatives
•
the choice sets were diverse enough to prevent local oddities from
determining the outcome yet comparable enough to permit pooling of the
observations
•
the population of decision makers was sufficiently well-defined.
To obtain sufficiently homogenous choice situations, the study focused on
"radial" relationships (between the city centre and outlying districts) that were
served by at least two PT alternatives with roughly comparable level of service
and identical price and carried a sufficiently large passenger stream.
In the selection of corridors, care was taken to ensure sufficient spread in PT
frequency, type of transport (bus, tram, metro, train, etc.), number of
interchanges, type of interchange and duration of the trip components (access,
egress, in-vehicle, waiting, etc.).
Choice sets typically contain between two and four PT alternatives, each with its
own characteristics (access L.O.S., frequency, transfers, mode changes, egress
etc.). In this way the respondent faced a choice between routes, and sometimes
also between PT modes.
Origin area
O
route 2
City centre
route 3
route 1
D
Figure 3: radial relationships
© Association for European Transport 2001
7
Dataset description and preliminary analysis
The dataset analysed has 1095 records with at least 2 PT alternatives each. The
variables of each alternative are: access time, egress time, in-vehicle time,
number of stops, and transfer time. The known characteristics of the decision
maker are: age, gender, activity at origin and at destination. The route
alternatives are non-overlapping in 75% of all cases; and partially overlapping in
25% of the cases.
Access- and egress times and walking time during transfer were estimated by
dividing the access, egress, and transfer walking distance (obtained from city
plans) by a walking speed of 4 km/hr.. In-vehicle times were taken from the
timetables, waiting times at the start of the trip were estimated from the
timetables using an approximation of the Weber waiting time function that was
calibrated on observed waiting times. Waiting times during transfers were
estimated as half the headway time. The number of interchanges was counted.
Table 1 shows the percentile values of the explanatory variables for the chosen
and non-chosen alternatives. That median value of the explanatory variables for
the chosen alternative is always better than that of the non-chosen alternatives.
The effective range of the explanatory variables is approximately 0.3-12 min. for
access time, 0.3-11 min. for egress time, 1.0-38 min. for in-vehicle time, 0.4-4.5
min. for waiting time, 0-3.1 minutes for walking time during transfer, and 0-2
interchanges.
Taccess
alt1
alt2
alt3
alt4
Tegress
alt1
alt2
alt3
alt4
Tinveh
alt1
alt2
alt3
alt4
Twait2
alt1
alt2
alt3
alt4
Ninterchange alt1
alt2
alt3
alt4
Twalktransfer alt1
alt2
alt3
alt4
Minimum Percentile 25 Median Percentile 75 Percentile 95 Percentile 99 Maximum
0.3
2.4
3.8
5.2
9.4
12.5
15.3
0.3
3.1
4.5
6.6
10.4
14.6
19.4
0.7
3.1
4.9
7.3
11.1
13.2
16.0
1.0
3.0
4.5
6.9
11.0 .
12.2
0.3
2.1
3.1
4.9
8.3
11.1
25.0
0.3
2.1
3.5
5.2
8.3
11.1
13.2
0.3
2.1
3.8
5.6
9.0
12.5
14.6
0.7
2.1
4.2
5.7
10.9 .
24.3
1.0
10.0
16.0
21.0
31.0
38.0
50.0
2.0
11.0
17.0
22.0
32.0
43.0
68.0
2.0
13.0
18.0
23.0
34.5
42.0
50.0
6.0
12.8
17.0
24.0
29.5 .
46.0
0.4
2.1
2.9
3.7
4.5
4.9
5.4
0.4
2.1
3.1
3.7
4.5
4.5
5.4
0.4
2.1
2.9
3.7
4.5
5.3
5.4
1.0
1.7
2.1
3.7
4.5 .
4.5
0.0
0.0
0.0
1.0
1.0
2.0
3.0
0.0
0.0
1.0
1.0
2.0
2.0
3.0
0.0
0.0
0.0
1.0
1.0
2.0
2.0
0.0
0.0
0.0
0.0
0.0
1.0
2.0
0.0
0.0
0.0
6.0
12.0
19.7
30.0
0
0.0
4.0
8.0
15.0
21.7
30.0
0
0.0
0.0
4.0
13.7
24.0
38.0
-
Table 1 : Percentile values of the explanatory variables by chosen and non-chosen
alternatives (n=1095)
© Association for European Transport 2001
8
2.1 Models originally estimated by van der Waard
In [Van der Waard (1988)] 5 models are considered, based on the following utility
functions:
1.
2.
3.
4.
5.
V
V
V
V
V
= β a t a + β e t e + β i ti + β wt w + β ttr ttr
= β a t a + β e t e + β i ti + β w t w + β ntr ntr
= β a t a + β e t e + β i ti + β wt w + β ttr ttr + β ntr ntr
= β a t a + β e te + β i ti + β wt w + β ttr ttr + β twlktr t wlktr
= β a t a + β e te + β i ti + β wt w + β ttr ttr + β twlktr t wlktr + β ntr ntr
(14)
with :
βi
:
model coefficient for the explanatory variables
ta
te
ti
tw
ttr
ntr
t wlktr
:
:
:
:
:
:
:
access time
egress time
in-vehicle time
waiting time at boarding stop
waiting time during transfers
number of transfers
walking time during transfer
In the estimation of the 5 models shown above, where each individual‘s choice
set consisted only of the alternatives actually known to that individual. The results
are shown in Table 2; all model coefficients are significant, and have the
expected sign.
Converged
Observations
Final log(L)
D.O.F.
Rho (0)
t_inveh
t_access
t_egress
t_wait
t_transf
n_transf
t_walktr
-0.1348
-0.2898
-0.1518
-0.1737
-0.3764
WAARD1
WAARD2
WAARD3
WAARD4
WAARD5
Yes
1095
-643.4
5
0.214
Yes
1095
-636.1
5
0.223
Yes
1095
-632.3
6
0.228
Yes
1095
-634.5
6
0.225
Yes
1095
-630.0
7
0.230
(-9.5)
(-12.0)
(-5.2)
(-4.0)
(-11.9)
-0.1221
-0.2961
-0.1439
-0.1989
(-8.7)
(-12.1)
(-5.0)
(-4.5)
-1.588
(-12.5)
-0.1283
-0.2999
-0.1524
-0.1977
-0.1491
-1.056
(-9.0)
(-12.2)
(-5.2)
(-4.4)
(-2.7)
(-4.6)
-0.1431
-0.2968
-0.1515
-0.1917
-0.3022
(-9.8)
(-12.1)
(-5.2)
(-4.3)
(-8.5)
-0.3592
(-4.2)
-0.1350
-0.3011
-0.1525
-0.2013
-0.1654
-0.7771
-0.2134
(-9.1)
(-12.2)
(-5.2)
(-4.5)
(-2.9)
(-3.0)
(-2.1)
Table 2: Coefficient estimates obtained with Van der Waard models.
2.2 Model extensions
Model (3) from (14) has been selected as basis for extension on grounds that it
incorporates all relevant variables but is easier to use in practice than models 4
and 5 because it merely needs the number of transfers but not the walking time
during transfer.
© Association for European Transport 2001
9
Single inflexion point per explanatory variable
The Van der Waard model was extended by adding the following non-linear
terms to the utility function specification.
V = β a t a + β wt w + β i ti + β e t e + β tr ttr + β wlktr t wlktr + β nr NRO
+ β i ,1 dim(ti ,α r ,1 ) + β v ,1 dim(t a ,α a ,1 ) + β e dim(te ,1 ,α e ,1 )
(15)
+ β w,1 dim(t w ,α w,1 ) + β ttr ,1 dim(tttr ,α ttr ,1 )
To get an overview of the behaviour of (15), a grid-search was carried out on the
5
cube α ∈ [1,21] with α = (α r ,1 , α v ,1 , α n ,1 , α w,1 , α o ,1 ) resulting in a list of estimation
results of the form (− LL, α r ,1 , α v ,1 , α n,1 , α w,1 , α o,1 ) . The (marginal) ranges of α for
which β could be estimated are listed in Table 3.
tc_inveh
Value
3
5
7
9
11
13
15
17
19
21
x
x
x
x
x
x
x
Variable
tc_acc
x
x
x
x
x
x
x
tc_egr
x
x
x
x
x
x
x
tc_wait
x
x
x
x
x
x
x
x
x
x
tc_trans
x
x
x
x
x
x
x
x
x
x
Table 3 : Parameter ranges for which MNL models could be estimated
624.1
.1
624
622.8
20
62
1.4
11
622.8
622.8
TC.ACC
9
62
2
624.1
.8
621.4
TC.EGR
15
624
.1
622
.8
621.4
7
10
622.8
5
62
1.4
622.8
5
61
8.
8
4
621.4
618.8
7.
61
621.4
620.1
624
.1
620
0.1
62
622
.8
625.5
3
10
12
14
16
18
20
5
TC.INVEH
10
15
622.8
624.1
5.5
62
20
TC.ACC
Figure 4: Contour plot of -Loglikelihood as a function of α parameters
© Association for European Transport 2001
.1
10
Shown on the left is the contour plot of –Loglikelihood as a function of
α n,1 (tc_acc) and α r ,1 (tc_inveh). It is clear that the effect of tc_acc exceeds that
the effect of tc_inveh. Shown on the right is the contour plot of –L as a function of
α n,1 (tc_acc) and α v ,1 (tc_egr). In both cases multiple local minima exist.
The estimation results (both α and β parameters) for four models (models A-D)
are Shown Table 4. The top rows give the log-likelihood and the number of
observations. The first block of parameters contains the alpha parameters (the
location of the inflexion points), the second block holds the beta parameters (the
weights of the explanatory variables) for the part before the inflexion point and
the third block of parameters holds the logit parameters after the inflexion point.
Run 1 of model A used grid-search; run 2 of model B used RS-LOPT, and run 3
of model A used MCS (but with a smaller search area than Models A and B).
Runs 1-3 all used the same explanatory variables, but where model A has 5
inflexion points, model B has three.
Model A
Run 1
-613.938
1095
11
Model A
Run 2
-613.839
1095
11
Model A
Run 3
-613.870
1095
11
Model B
Run 4
-616.636
1095
9
1 10.0000
2 7.0000
3 6.0000
4 3.0000
5 5.0000
10.0002
6.9445
6.2503
2.9177
14.6979
10.0113
6.9593
6.2254
3.0000
5.0014
9.9988
6.9463
6.2135
0.0000
0.0000
-LL
n
dof
Alpha parameters tc_inveh
tc_access
tc_egress
tc_wait
tc_transf
Beta parameters
t_inveh
t_access
t_egress
t_wait
t_transf
n_transf
t_inveh_2
t_access_2
t_egress_2
t_wait_2
t_transf_2
12
13
14
15
16
17
-0.2639
-0.3869
-0.3339
0.2722
-0.2363
-0.9443
-4.6
-10.8
-6.5
1.0
-3.2
-3.7
-0.2625
-0.3883
-0.3210
0.3431
-0.1869
-1.0210
-4.5
-10.8
-6.6
1.1
-3.2
-4.3
122
132
142
152
162
0.1520
0.2098
0.3857
-0.5257
0.1664
2.5
2.9
4.2
-1.7
1.6
0.1435
0.2089
0.3811
-0.5961
9.0933
2.4
2.9
4.1
-1.7
1.7
-0.2640 -4.6
-0.3876 -10.8
-0.3236 -6.6
0.2741 1.0
-0.2354 -3.2
-0.9461 -3.7
0.1520
0.2086
0.3875
-0.5280
0.1659
2.5
2.9
4.2
-1.7
1.6
-0.2328 -4.2
-0.3829 -10.7
-0.3215 -6.6
-0.1978 -4.3
-0.1522 -2.7
-1.1197 -4.8
0.1168
0.2112
0.3815
2.0
2.9
4.1
Table 4 : Parameter estimation results for single inflexion point model
The parameter values for the inflexion points in tc_inveh, tc_access, and
tc_egress, and tc_wait are all close together. The exception is the parameter
tc_transf (transfer time), for which values of 5 and 14.7 were obtained in different
runs. After reducing the number of inflexion points, model B gave values for the
inflexion points that confirm those of Runs 1 and 3.
The β parameters have the expected sign, and correspond closely across all
models for the variables t_inveh, t_access, t_egress, and n_transf; these
parameter values also have high t-ratios. The parameter estimates for t_wait
© Association for European Transport 2001
11
show more spread, and have lower t-ratios; the estimates for t_transf are clearly
influenced by the high value of tc_transf in run 2. Model B was obtained by
eliminating the inflexion points for t_transf and t_wait from the model A.
The result that the net weight coefficient for egress distance becomes positive for
egress times greater than about 6.2 min is counter-intuitive. This counter-intuitive
results was seen to reflect a feature of the data itself: for egress distances
greater than about 450 m (10% of the total), the egress distance of the chosen
alternative is consistently longer than of the non-chosen alternatives. No
correlation between large egress distance and other explanatory variables could
be observed. The number of data points for which this holds is about 10% of the
total. Previous researchers have voiced an (unpublished) explanation for this
phenomenon in the case of school-children: schoolchildren seem to disembark in
groups a few stops before the school in order to walk to school together, which
affects the weight of egress time. If separate models are estimated per travel
purpose, in all cases the weight for egress time in the linear model is significantly
lower than for access time. The large egress times in the chosen alternative are
not offset by gains in the number of interchanges or total travel time. It mainly
occurs in intra-city relationships; very few regional relation ships were involved;
all observations were done in the am period, and most of them between 8 and 9.
This leads us to suspect the presence of unobserved explanatory variables, this
particular feature of the data does not seem to have been not noticed before,
despite intensive study of the dataset.
We note that the value of the egress-time coefficient for the models with linear
utility function in Table 2 are about half of that of access time. It seems that the
parameter estimate for egress time in the earlier models reflects the fact that the
linear model will interpolate the shape of the utility function that is present in the
data. We see in Table 4 that on the left of the inflexion point the weight of egresstime is closer to that of access time. Apparently the use of inflexion points has
some value as a diagnostic tool, apart from any other considerations.
The issue of the location of the inflexion point for transfer time tc_transf was
investigated by plotting the likelihood as a function of the location of the inflexion
points; the plots are shown in Figure 5.
© Association for European Transport 2001
12
Figure 5 : Log-likelihood as a function of the location of the inflexion points
The effects of access- and egress times (alpha 2 and alpha 3) are particularly
well-defined; the effect of in-vehicle time (alpha 1) less so. The effect of access
time (alpha 2) and transfer time (alpha 5) shows two minima: the first one around
7 and 5 minutes, the second one around 15 and 14 minutes with likelihood
values that are close together. This explains why the MCS routine found two
different values, and it suggests that two inflection points are needed in the
transformation of t_transf. The likelihood seems indifferent w.r.t. the location of
inflexion points in the waiting time (alpha 4), so that this inflexion point was
eliminated from the model.
Statistical significance
The models with inflexion points contain 10 parameters more than the linear
model if the location of the inflexion point is counted as a parameter, and 5 if it is
not. Two times the log-likelihood difference between two models is asymptotically
χ102 or χ 52 distributed; the corresponding critical points at 95% confidence level
are: 18.307 and 11.0705, at 99% confidence level this is 23.2093 and 15.0863.
The log-likelihood of the linear model (model 3) is –632.3, and the log-likelihoods
of runs 1 and 2 of models A are -613.938 and -613.839 respectively. The loglikelihood difference of the models with and without inflexion points is 18.5; two
time this difference is about 37, so that in either case we can reject the null
hypothesis of no inflexion points at 95% and 99% confidence levels.
© Association for European Transport 2001
13
The log-likelihood difference between model A and model B makes that the null
hypothesis can only be rejected at the 6.3% confidence level, so that it is not
possible to reject the null hypothesis that the inflexion points in t_wait and
t_transf are zero.
Resulting form of the utility functions
Using the coefficient estimates shown in Table 4, the profiles of the utility
contributions of access time and egress time were drawn up, and are shown in
Figure 6.
For access time, the inflexion point is at t = 7 min., and the slope of the utility
function decreases after this point. This suggests that people weigh access time
less heavily when it is greater than 7 minutes. Of course this may be a statistical
phenomenon rather than a causal one, and could also signify that people who
accept longer access times care less about them.
Single-inflexion point piecewise
linear model
0
0
1
2
3
4
5
-1
6
7
8
9 10 11 12 13 14
-2
V
-3
-4
-5
Access time [min]
Utility contribution of egress time
Utility
9
10
.5
12
13
.5
15
16
.5
18
19
.5
21
22
.5
24
6
7.
5
-0.5
3
4.
5
0
1.
5
0
-1
-1.5
-2
-2.5
Egress time [min]
Figure 6: Utility contribution of access time (left) and egress time (right)
For egress time the inflexion point is at 6 min., and egress time weight is seen to
decrease, which is not realistic. Comparison of the difference in egress time
between the chosen and the non-chosen alternative (see in Figure 7 ) shows that
there are only 5 data points with egress time greater than 12 min., which also
© Association for European Transport 2001
14
differ considerably in their difference between egress time for the chosen and
non-chosen alternatives (highlighted with an ellipse in Figure 7).
These datapoints were removed from the dataset and the model was reestimated to test their influence on the results. The estimation results of the
trimmed dataset are shown in Table 5. The parameter estimate for egress time to
the right of the inflexion point has decreased from 0.382 to 0.299. In this case the
piecewise linear model is sensitive to outliers.
EGRDIF21
As can be seen from Figure 8, the contribution to the utility function is now more
reasonable, except for purpose “school”, for which a tentative explanation was
presented on page 15.
20
18
16
14
12
10
8
6
4
2
0
-2
-4
-6
-8
-10
-12
-14
-16
-18
-20
0
2
4
6
8
10
12
14
16
18
20
EGRES1
Figure 7: Utility difference between egress time of chosen and non-chosen alternatives
versus egress time of the alternative
-LL
n
dof
Standard
-632.3
1095
6
Coef
t-rat
Alpha parameters tc_inveh
tc_access
tc_egress
Beta parameters
t_inveh
t_access
t_egress
t_wait
t_transf
n_transf
Piecewise linear
-616.636
1095
9
Coef
t-rat
9.9988
6.9463
6.2135
-0.1283
-9
-0.2999 -12.2
-0.1524 -5.2
-0.1977 -4.4
-0.1491 -2.7
-1.056 -4.6
t_inveh_2
t_access_2
t_egress_2
-0.2328 -4.2
-0.3829 -10.7
-0.3215 -6.6
-0.1978 -4.3
-0.1522 -2.7
-1.1197 -4.8
0.1168
0.2112
0.3815
2.0
2.9
4.1
Piecewise linear
(trimmed)
-616.636
1095
9
Coef
t-rat
9.9988
6.9463
6.2135
-0.2347
-0.3854
-0.3067
-0.2016
-0.1507
-1.1250
-4.2
-10.7
-6.1
-4.4
-2.7
-4.9
0.1191
0.2167
0.2994
2.0
3.0
2.9
Table 5 : Estimation results after removal of outlying datapoints
© Association for European Transport 2001
15
Utility contribution of egress time
(after removing 5 highest egress time obs. out of 1095)
Weighted egress time
24
22
20
18
16
14
12
10
8
6
4
2
0
0
-0.5
School
-1
Shop
-1.5
Work
-2
-2.5
Egress time [min.]
Figure 8 : Contribution of egress time after trimming
Two inflexion points per explanatory variable
The results from the extension of the Van der Waard models with a single
inflexion point per explanatory variable suggest (see Figure 5, right-hand plot)
that a model specification with two inflexion points can be considered. This was
checked by estimating the following model
V = β a ta + β wt w + β i ti + β ete + βtr ttr + β wlktrt wlktr + β ntr ntr
+ β i ,1 dim(ti ,αi,1 ) + β a,1 dim(t a ,α a,1 ) + β e dim(te,1 ,α e,1 ) + β w,1 dim(t w ,α w,1 ) + βtr,1 dim(tttr ,αtr,1 )
(16)
+ β i ,2 dim(ti ,αi ,2 ) + β a,1 dim(ta ,α a,2 ) + β e dim(te,2 ,α e,2 ) + β w,2 dim(tw ,α w,2 ) + β tr,2 dim(tttr ,αtr,2 )
The location of the second inflexion point was in the top 99 percentile range of the
explanatory variables, and is therefore quite sensitive to outlying observations.
Clearly there is only room for a single inflexion point. For this reason the model
specified in (16) was not investigated further.
The Box-Cox transformation
As with the piecewise linear function, model (3) from (14) has been selected as
basis for extension.
For the following variables t a , te , ti , t w the Box-Cox transform was calculated:
t a → t aλa , te → t eλe , ti → tiλi , t w → t wλw .
The following models were estimated:
V = β a t a + β wt w + β i tiλi + β ete + β tr ttr + β wlktrt wlktr + β ntr ntr
(17)
V = β a t aλa + β wt w + β i tiλi + β eteλe + β tr ttr + β wlktrt wlktr + β ntr ntr
(18)
In theory the advantage of this approach should be to show whether or not the
non-linearity adds to the explanatory value of the model.
© Association for European Transport 2001
16
The estimation of the model shown in (17) gave a λ = 0.5, which is broadly in
agreement with the results from the piecewise linear model.
However the estimation of the model shown in (18) resulted in λ = 2.5 for invehicle time and λ = 1.5 for access- and egress time. Unfortunately these results
lack credibility for the following reasons:
•
the λ value of 2.5 for in-vehicle time, and 1.5 for access- and egress time
suggest an increase in marginal disutility of a minute of access- egress- or invehicle time where model (17), the piecewise linear model and common sense
suggest a decrease in marginal disutility
•
values of λ > 1 imply that the error term ε n in (1) should decrease with
increasing in-vehicle time, which is not visible in plots of the utility difference
between chosen and non-chosen alternatives (according to the Van der Waard
model 2) as a function of trip length, total trip length, or disutility of the chosen
alternative as shown in Figure 9.
6
4
2
0
UDIFW3
-2
-4
-6
-14
-12
-10
-8
-6
-4
-2
0
U1W2
Figure 9: Disutility difference (chosen - non-chosen) by disutility of chosen alternative
Unfortunately the estimation results for λ were found to be unstable between
model specifications, which suggests that the Box-Cox transformation should be
used with caution.
© Association for European Transport 2001
17
3 CONCLUSIONS
The addition of non-linearities “costs” extra parameters; parameters were
determined using Maximum Likelihood estimation. Instead of writing software
from scratch to estimate all parameters together, the ML problem was split into
ordinary MNL estimation (Problem B) and determination of the parameters that
determine the non-linearity (Problem A). It was found that standard local
minimisation routines that rely on derivative information would not work
satisfactorily on Problem A, and that Problem A has multiple local minima. Our
conclusion is that Problem A has to be solved using global minimisation routines.
Problem A was solved using grid-search, random search, RS-LOPT, and the
MCS global optimisation software. Of the local optimisation methods tried in RSLOPT only the derivative-free Simplex and MCS methods converged, the
Simplex method very slowly. The MCS routine found the same optima as gridsearch and RS-LOPT, but quicker by a factor of 60. Although we are reluctant to
put trust in global minimisation routines (but recommend at least some checks
based on grid-search or RS), estimating the location of inflexion points using the
MCS optimisation routine as described appears feasible.
The extra logit parameters that define inflexion points are statistically different
from zero, and lead to improved model fit. The results are behaviourally credible
in the sense that the disutility of a small increase in access- or egress time varies
with the total amount of access/egress time already experienced. A counterintuitive result w.r.t. egress time was found, but turns out to be a feature of the
data and could be due to unobserved variables. The dataset under consideration
shows evidence of non-linearity in the valuation of access time, egress time, and
in-vehicle time for route-choice in PT networks.
Models with two inflexion points could not be estimated on this dataset. Models
using the Box-Cox transformation were also estimated, but gave unstable or
counter-intuitive results.
We note that the current dataset, being based on urban public transport trips in
radial corridors in densely populated areas of The Netherlands, contains only
short trips with short egress and access times. This dataset is not representative
for interurban (regional) transport, so that the findings of non-linearities in the
utility function cannot be generalised.
4 REFERENCES
Amemiya, T. (1985) Advanced econometrics. Harvard University Press.
Ben-Akiva, M., Lerman, S.R. (1989) Discrete choice analysis. Theory and
application to travel demand. MIT press.
Blajac, T, Causse, A. (1998) Valeur du temps de transport: l’apport de la
modelisation micro-economique du choic modal. Research report No. DT 98-01,
Laboratoire Montpellieran d’Economique theorique et appliquee, University of
Montpellier, France.
© Association for European Transport 2001
18
Blajac, T, Causse, A. (2001) Value of travel time: a theoretical legitimization of
some nonlinear representative utility in discrete choice models. Transp. Res. B
35 (2001) pp. 391-400
CTPS (1997) Transfer penalties in urban mode choice modelling. Report
prepared for the US Department of Transportation, Federal Transit
Administration, Federal Highway administration, Office of the Secretary, US
Environmental protection Agency under the Travel model improvement program
(TMIP).
Eliasson (2000) Transport and location analysis. Ph.D. thesis. Royal Institute of
Technology, dept. Of Infrastructure and planning, Transport and location analysis
division, Stockholm, Sweden. Report TRITA-IP FR 00-79.
Gaudry, M.J.I., Wills, M. (1978) Estimating the functional form of travel demand
models. Transpn. Res. Vol. 12, pp. 257-289.
HCG (1995) Alogit User’s guide. Hague Consulting Group, The Netherlands.
Huyer, W., Neumaier, A. (1999) Global optimisation by multilevel coordinate
search. SIAM Journal of Global Optimisation 14 (1999), 331-355.
Lotan, T., Koutsopoulos, H.N. (1993) Models for route choice behaviour in the
presence of information using concepts from fuzzy set theory and approximate
reasoning. Transportation 20: 129-155, 1993.
Mandel, B., Gaudry, M., Rothengatter, W. (1993) A disaggregate Box-Cox logit
mode choice model of intercity passenger travel in Germany. Departement de
sciences economiques; cahier 9307, ISSN 0709-9231.
Pinter,J.D. (1996) Global optimisation in action, Kluwer, Dordrecht.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. (1995) Numerical
recipes in C. (Second edition). Cambridge University Press.
Van der Waard, J. (1988) Onderzoek weging tijdelementen. Deelrapport 2,
Dataverzameling en verwerking (In Dutch). Delft University of Technology, faculty
of Civil Engineering, Transportation Planning section. Report no. VK 5302.302.
© Association for European Transport 2001
19