NON-LINEAR UTILITY FUNCTIONS IN MNL DISCRETE-CHOICE MODELS Ch.D.R. Lindveld Faculty of Civil Engineering and Geosciences, Delft University of Technology 1 INTRODUCTION The standard practice in modelling route choice and mode choice is to use Random-Utility models (e.g. logit or probit) with utility functions that are linear in parameters and often linear functions of the explanatory variables. However, there are reasons to suspect that non-linearities exist in the valuation of attributes in route-choice behaviour. The literature contains a number of instances where non-linearities have been incorporated in discrete-choice models in a pragmatic way, and more recently gives a plausible theoretical underpinning for non-linearities. Gaudry and Wills (1978) present an investigation into the functional form of travel demand models, both from a cross-sectional and time-series point of view, and introduce the use of location parameters and the Box-Cox transform. A report for the US federal Transit Authority [CTPS (1997)] reports that after careful examination, the weight of a 1 mile of walking distance in public transport (PT) access and egress varies significantly with the distance. Mandel et. al. (1993) propose a non-linear logit model for mode-choice that relies on applying a BoxCox transformation to the explanatory variables. One of the arguments put forth is that the marginal disutility of an extra mile of distance should be expected to decrease with distance. Blajac et al. (1998), (2001) show how non-linear specification of the indirect utility function can be derived from a utility maximisation programme. Eliasson (2000) argues that the derivative of trip disutility w.r.t. time must contain a component due to the direct utility function in addition to the component of the indirect utility function, which leads to a “quality adjustment” of the value of time depending on how and where time is spent. This raises the question of whether adding threshold effects and non-linear transformations to conventional logit models can be shown to improve the models, and which method seems best. To investigate this we will estimate a number of logit models with non-linear utility functions on a route-choice dataset collected by earlier researchers in The Netherlands, and show a practical method to estimate a logit model with piece-wise linear utility functions. The results tend to show that the piecewise linear formulation gives reasonable results and is easy to use, but that the use of the Box-Cox transform must be used with care, especially if it is applied to more than one variable simultaneously. © Association for European Transport 2001 1 1.1 Incorporating non-linear explanatory variables in utility functions We will use the linear-in-parameters MultiNomial Logit model (MNL), and add non-linear transformations of the explanatory variables. The probability that individual n chooses alternative i over alternatives j (part of the choice set of individual n) is: ( = P (V ) ( > ε ) ∀j ) Pin = P U in > U jn = P Vin + ε in > V jn + ε jn ∀j in − V jn (1) n Denoting the data matrix by X , the vector of model parameters by β , and the systematic part of the utility function for alternative i of individual n as: K Vin (X, β ) = ∑ β k X kin (2) k =1 Denoting the choice set of individual n by Cn , the LMNL choice probabilities are Pin (X, β ) = eVin ( X ,β ) eVin ( X ,β ) (3) ∑ i∈Cn The log-likelihood of observing dataset X given are the vector β contains the true LMNL parameters is: L(X β ) = ∑ ln(P (X, β )) N i *n n =1 where i * denotes the chosen alternative. The vector of parameters β of the LMNL is usually determined through maximum likelihood estimation: βˆ = max L(X β ) (4) β The Box-Cox transform In [Gaudry and Wills (1978)], the Box-Cox transform is presented as a way of incorporating non-linearity into utility function of a logit model. The Box-Cox transform of explanatory variables X kin is defined as: X λki kin ( ) λki X kin −1 = λkj ln X kin ( ) λki ≠ 0 X kin > 0 (5) λki = 0 The idea of applying this transform in a logit model context is presented in [Mandel et. al. (1993)]. With the Box-Cox transform, the systematic part of the utility function becomes: K λki , Vin = ∑ β ki X kin (6) k =1 © Association for European Transport 2001 2 Note that for λ ≠ 1 the variance of the explanatory variables varies with their magnitude. A formulation of the piecewise linear function The piece-wise linear formulation is presented as a way of incorporating nonlinearity in [Ben-Akiva and Lerman (1989)]. A formulation of the piece-wise linear function is the one shown in Figure 1; its functional form is as follows: 0 ≤ x < α1 β1 x v( x ) = x1 ≤ x < α 2 (7) β1 x + β 2 (x − α1 ) β x + β (x − α ) + β (x − α ) x ≥ α2 2 1 3 2 1 Note that this form guarantees continuity of v(x ) in α 1 and α 2 . Using a suitable indicator function, such as the function dim offered in the Alogit estimation package (see [HCG (1995)]), which is defined as: x <α 0 dim( x, α ) = (8) x − α x ≥ α Equation (7) can be written as a linear combination of non-linear functions of x: v( x ) = β1 x + β 2 dim( x, α1 ) + β 3 dim( x, α 2 ) (9) This functional form is continuous (as dim(α,α) = 0), it is linear in the model parameters β , (it is only non-linear in the parameters α ), so that it can be estimated using standard MNL estimation packages. v( x ) β3 β1 x + β 2 ( x − α1 ) + β 3 ( x − α 2 ) β1 x + β 2 ( x − α1 ) β2 β1 x α1 β1 α2 x Figure 1: Continuous piecewise linear utility functions © Association for European Transport 2001 3 1.2 Model specification and parameter estimation Having transformed certain explanatory variables in the systematic part of the utility function, we now have an MNL with a non-linear log-likelihood function. Using a maximum-likelihood approach to estimate the non-linear models, our model parameters can be determined as follows: (αˆ , βˆ , λˆ ) = max L(X α, β, λ ) (10) α,β, λ To be useful for practitioners, we wanted to re-use existing software as much as possible. So instead of writing a dedicated solver for this problem, we have split (10) into two sub-problems one of which can be solved with standard MNL estimation software, while the other can be attacked with modest means. We note that (10) can be interpreted as: (αˆ , βˆ , λˆ ) = max L(X α, β, λ ) = max max L(X, α, λ β) = max L (X, β α, λ ) * α,β, λ α, λ β α, λ (11) where L* (X, β α, λ ) is known as a concentrated likelihood function (see [Amemiya (1985), §4.2.5], where it is shown to be a valid way of estimating the parameters given certain regularity conditions). We will refer to the maximisation problem of finding the combination of α and λ for which the best MNL model can be estimated: (αˆ , λˆ ) = max L (X, β α, λ ) * α, λ (12) as problem A, and to the minimisation problem max L (X, α, λ β ) β (13) which is standard MNL estimation since α and λ are fixed, as problem B. To solve problem B we have to the estimate the parameters β of a MNL; the loglikelihood function for this problem is concave so that any local maximum found by the Newton minimisation procedure is also the global maximum. A standard logit estimation package (Alogit) was used to solve Problem B. Finding the global maximum corresponding to problem A The problem is to find the global maximum corresponding to problem A. In contrast to problem B, the function, L* in problem A will usually have multiple local maxima (especially with respect to the parameter α), so that local optimisation routines such as Newton or Quasi-Newton methods may fail to find the global optimum. In our formulation the likelihood function L is continuous but non-differentiable in α, although this could be addressed by smoothing the transition around the inflexion points. We have noted that standard minimisation algorithms (that rely on derivative information) break down on this problem, and we suspect that this is caused by the non-differentiability of L. In addition we can no longer rely on © Association for European Transport 2001 4 local maximisation methods to find the global optimum, or even to come up with sensible solutions, so that finding the optimal values of α and λ requires global optimisation. Pinter (1996) lists a variety of techniques for finding global optima, but there seems to be no single “best” method. We see stochastic optimisation methods (which give a certain probability of finding the optimum with a limited amount of effort) and deterministic optimisation methods (which are guaranteed to find the optimum, usually with high computational effort). The former include simulated annealing, random-search and genetic programming, while the latter consist of branch-and-bound methods and interval methods. Brute force - gridsearch The brute-force approach to problem A is to do a grid-search, i.e. to evaluate problem A on a grid of values for vectors α and λ. This approach is feasible when the dimension of α and λ is low (1 or 2), but quickly becomes infeasible as the dimension increases. The number of function evaluations can be reduced by sampling our test points; this is known as the Random Search (“RS”) method. The disadvantage is that the points may cluster. As there is no need for the sequence to be random, this problem can be addressed by using a “low-discrepancy sequence”, which produces an evenly spaced set of points in all dimensions. We have used the Sobol sequence to generate the points (see [Press et al. (1995)]). Another option would have been to use the Halton sequence. For our low-dimensional our problems (n < 5) we did not find significant differences between the two. Random search – local optimisation An often-used approach is to combine local optimisation with a global strategy for picking starting points. Assuming that the global optimum has a “basin of attraction”, i.e. a region around it in which a local minimisation routine would converge towards it, we would have to ensure that one of our starting points hits the basin to find the global optimum. In the RS-LOPT (Random Search – Local OPTimisation) method starting points are sampled, and a local minimisation routine is started at each point. The probability that the RS-LOPT approach will find the optimum is the probability that the basin of local convergence is hit by at least one sample point. Unfortunately the diameter of this basin of attraction is not known in advance. As local minimisation routines we used the Nelder-Mead simplex method and the MCS method (see below). Dedicated global minimisation routines Of the available routines for global optimisation, we have selected the Multilevel Co-ordinate Search (MCS) routine (described in [Huyer and Neumaier (1999)]) © Association for European Transport 2001 5 which reports that it compares favourably to standard algorithms for global optimisation on a standard set of test problems. Software set-up The software set-up used to solve the combined estimation problem is shown in Figure 2. To solve the MNL estimation problem (problem B), we used ALOGIT (see [HCG (1995)]). The solver for problem A produces estimates for α and λ, which are hard-coded in an Alogit control file. Next Alogit is started to solve problem B (estimation of the resulting MNL model), and the results (the loglikelihood LL and the MNL parameters β) are output to file. The solver for problem A reads this file and revises its estimates for α and λ. The process stops when the solver for problem A can no longer improve its solution. Global optimisation program Alogit template control file λ, α parameters Control file generation (Perl script) Alogit control file Data Alogit LL, β parameters Figure 2: Software set-up to solve the combined estimation problem 2 THE CASE STUDY MNL models with non-linearities have been applied to an RP survey dataset collected by Van der Waard on route- and mode choice in Public Transport networks in The Netherlands. A detailed description of this dataset and the choice situation can be found in [Van der Waard (1988)], but will be briefly summarised here. The Van der Waard study aimed at determining the relative weights of the time spent in various elements of a Public Transport trip (access, waiting, in-vehicle, transfer (if any), egress). Respondents were asked for the starting and destination address of their trip, means of travel used to reach the stop at which they were interviewed, means of © Association for European Transport 2001 6 transport they would use to continue their trip, and the transport alternatives they could list. In addition a number of segmentation questions were asked. In the design of the study care was taken to ensure that: • the respondents had a viable choice between route alternatives • the choice set was simple enough for complete data to be gathered on all alternatives • the choice sets were diverse enough to prevent local oddities from determining the outcome yet comparable enough to permit pooling of the observations • the population of decision makers was sufficiently well-defined. To obtain sufficiently homogenous choice situations, the study focused on "radial" relationships (between the city centre and outlying districts) that were served by at least two PT alternatives with roughly comparable level of service and identical price and carried a sufficiently large passenger stream. In the selection of corridors, care was taken to ensure sufficient spread in PT frequency, type of transport (bus, tram, metro, train, etc.), number of interchanges, type of interchange and duration of the trip components (access, egress, in-vehicle, waiting, etc.). Choice sets typically contain between two and four PT alternatives, each with its own characteristics (access L.O.S., frequency, transfers, mode changes, egress etc.). In this way the respondent faced a choice between routes, and sometimes also between PT modes. Origin area O route 2 City centre route 3 route 1 D Figure 3: radial relationships © Association for European Transport 2001 7 Dataset description and preliminary analysis The dataset analysed has 1095 records with at least 2 PT alternatives each. The variables of each alternative are: access time, egress time, in-vehicle time, number of stops, and transfer time. The known characteristics of the decision maker are: age, gender, activity at origin and at destination. The route alternatives are non-overlapping in 75% of all cases; and partially overlapping in 25% of the cases. Access- and egress times and walking time during transfer were estimated by dividing the access, egress, and transfer walking distance (obtained from city plans) by a walking speed of 4 km/hr.. In-vehicle times were taken from the timetables, waiting times at the start of the trip were estimated from the timetables using an approximation of the Weber waiting time function that was calibrated on observed waiting times. Waiting times during transfers were estimated as half the headway time. The number of interchanges was counted. Table 1 shows the percentile values of the explanatory variables for the chosen and non-chosen alternatives. That median value of the explanatory variables for the chosen alternative is always better than that of the non-chosen alternatives. The effective range of the explanatory variables is approximately 0.3-12 min. for access time, 0.3-11 min. for egress time, 1.0-38 min. for in-vehicle time, 0.4-4.5 min. for waiting time, 0-3.1 minutes for walking time during transfer, and 0-2 interchanges. Taccess alt1 alt2 alt3 alt4 Tegress alt1 alt2 alt3 alt4 Tinveh alt1 alt2 alt3 alt4 Twait2 alt1 alt2 alt3 alt4 Ninterchange alt1 alt2 alt3 alt4 Twalktransfer alt1 alt2 alt3 alt4 Minimum Percentile 25 Median Percentile 75 Percentile 95 Percentile 99 Maximum 0.3 2.4 3.8 5.2 9.4 12.5 15.3 0.3 3.1 4.5 6.6 10.4 14.6 19.4 0.7 3.1 4.9 7.3 11.1 13.2 16.0 1.0 3.0 4.5 6.9 11.0 . 12.2 0.3 2.1 3.1 4.9 8.3 11.1 25.0 0.3 2.1 3.5 5.2 8.3 11.1 13.2 0.3 2.1 3.8 5.6 9.0 12.5 14.6 0.7 2.1 4.2 5.7 10.9 . 24.3 1.0 10.0 16.0 21.0 31.0 38.0 50.0 2.0 11.0 17.0 22.0 32.0 43.0 68.0 2.0 13.0 18.0 23.0 34.5 42.0 50.0 6.0 12.8 17.0 24.0 29.5 . 46.0 0.4 2.1 2.9 3.7 4.5 4.9 5.4 0.4 2.1 3.1 3.7 4.5 4.5 5.4 0.4 2.1 2.9 3.7 4.5 5.3 5.4 1.0 1.7 2.1 3.7 4.5 . 4.5 0.0 0.0 0.0 1.0 1.0 2.0 3.0 0.0 0.0 1.0 1.0 2.0 2.0 3.0 0.0 0.0 0.0 1.0 1.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 2.0 0.0 0.0 0.0 6.0 12.0 19.7 30.0 0 0.0 4.0 8.0 15.0 21.7 30.0 0 0.0 0.0 4.0 13.7 24.0 38.0 - Table 1 : Percentile values of the explanatory variables by chosen and non-chosen alternatives (n=1095) © Association for European Transport 2001 8 2.1 Models originally estimated by van der Waard In [Van der Waard (1988)] 5 models are considered, based on the following utility functions: 1. 2. 3. 4. 5. V V V V V = β a t a + β e t e + β i ti + β wt w + β ttr ttr = β a t a + β e t e + β i ti + β w t w + β ntr ntr = β a t a + β e t e + β i ti + β wt w + β ttr ttr + β ntr ntr = β a t a + β e te + β i ti + β wt w + β ttr ttr + β twlktr t wlktr = β a t a + β e te + β i ti + β wt w + β ttr ttr + β twlktr t wlktr + β ntr ntr (14) with : βi : model coefficient for the explanatory variables ta te ti tw ttr ntr t wlktr : : : : : : : access time egress time in-vehicle time waiting time at boarding stop waiting time during transfers number of transfers walking time during transfer In the estimation of the 5 models shown above, where each individual‘s choice set consisted only of the alternatives actually known to that individual. The results are shown in Table 2; all model coefficients are significant, and have the expected sign. Converged Observations Final log(L) D.O.F. Rho (0) t_inveh t_access t_egress t_wait t_transf n_transf t_walktr -0.1348 -0.2898 -0.1518 -0.1737 -0.3764 WAARD1 WAARD2 WAARD3 WAARD4 WAARD5 Yes 1095 -643.4 5 0.214 Yes 1095 -636.1 5 0.223 Yes 1095 -632.3 6 0.228 Yes 1095 -634.5 6 0.225 Yes 1095 -630.0 7 0.230 (-9.5) (-12.0) (-5.2) (-4.0) (-11.9) -0.1221 -0.2961 -0.1439 -0.1989 (-8.7) (-12.1) (-5.0) (-4.5) -1.588 (-12.5) -0.1283 -0.2999 -0.1524 -0.1977 -0.1491 -1.056 (-9.0) (-12.2) (-5.2) (-4.4) (-2.7) (-4.6) -0.1431 -0.2968 -0.1515 -0.1917 -0.3022 (-9.8) (-12.1) (-5.2) (-4.3) (-8.5) -0.3592 (-4.2) -0.1350 -0.3011 -0.1525 -0.2013 -0.1654 -0.7771 -0.2134 (-9.1) (-12.2) (-5.2) (-4.5) (-2.9) (-3.0) (-2.1) Table 2: Coefficient estimates obtained with Van der Waard models. 2.2 Model extensions Model (3) from (14) has been selected as basis for extension on grounds that it incorporates all relevant variables but is easier to use in practice than models 4 and 5 because it merely needs the number of transfers but not the walking time during transfer. © Association for European Transport 2001 9 Single inflexion point per explanatory variable The Van der Waard model was extended by adding the following non-linear terms to the utility function specification. V = β a t a + β wt w + β i ti + β e t e + β tr ttr + β wlktr t wlktr + β nr NRO + β i ,1 dim(ti ,α r ,1 ) + β v ,1 dim(t a ,α a ,1 ) + β e dim(te ,1 ,α e ,1 ) (15) + β w,1 dim(t w ,α w,1 ) + β ttr ,1 dim(tttr ,α ttr ,1 ) To get an overview of the behaviour of (15), a grid-search was carried out on the 5 cube α ∈ [1,21] with α = (α r ,1 , α v ,1 , α n ,1 , α w,1 , α o ,1 ) resulting in a list of estimation results of the form (− LL, α r ,1 , α v ,1 , α n,1 , α w,1 , α o,1 ) . The (marginal) ranges of α for which β could be estimated are listed in Table 3. tc_inveh Value 3 5 7 9 11 13 15 17 19 21 x x x x x x x Variable tc_acc x x x x x x x tc_egr x x x x x x x tc_wait x x x x x x x x x x tc_trans x x x x x x x x x x Table 3 : Parameter ranges for which MNL models could be estimated 624.1 .1 624 622.8 20 62 1.4 11 622.8 622.8 TC.ACC 9 62 2 624.1 .8 621.4 TC.EGR 15 624 .1 622 .8 621.4 7 10 622.8 5 62 1.4 622.8 5 61 8. 8 4 621.4 618.8 7. 61 621.4 620.1 624 .1 620 0.1 62 622 .8 625.5 3 10 12 14 16 18 20 5 TC.INVEH 10 15 622.8 624.1 5.5 62 20 TC.ACC Figure 4: Contour plot of -Loglikelihood as a function of α parameters © Association for European Transport 2001 .1 10 Shown on the left is the contour plot of –Loglikelihood as a function of α n,1 (tc_acc) and α r ,1 (tc_inveh). It is clear that the effect of tc_acc exceeds that the effect of tc_inveh. Shown on the right is the contour plot of –L as a function of α n,1 (tc_acc) and α v ,1 (tc_egr). In both cases multiple local minima exist. The estimation results (both α and β parameters) for four models (models A-D) are Shown Table 4. The top rows give the log-likelihood and the number of observations. The first block of parameters contains the alpha parameters (the location of the inflexion points), the second block holds the beta parameters (the weights of the explanatory variables) for the part before the inflexion point and the third block of parameters holds the logit parameters after the inflexion point. Run 1 of model A used grid-search; run 2 of model B used RS-LOPT, and run 3 of model A used MCS (but with a smaller search area than Models A and B). Runs 1-3 all used the same explanatory variables, but where model A has 5 inflexion points, model B has three. Model A Run 1 -613.938 1095 11 Model A Run 2 -613.839 1095 11 Model A Run 3 -613.870 1095 11 Model B Run 4 -616.636 1095 9 1 10.0000 2 7.0000 3 6.0000 4 3.0000 5 5.0000 10.0002 6.9445 6.2503 2.9177 14.6979 10.0113 6.9593 6.2254 3.0000 5.0014 9.9988 6.9463 6.2135 0.0000 0.0000 -LL n dof Alpha parameters tc_inveh tc_access tc_egress tc_wait tc_transf Beta parameters t_inveh t_access t_egress t_wait t_transf n_transf t_inveh_2 t_access_2 t_egress_2 t_wait_2 t_transf_2 12 13 14 15 16 17 -0.2639 -0.3869 -0.3339 0.2722 -0.2363 -0.9443 -4.6 -10.8 -6.5 1.0 -3.2 -3.7 -0.2625 -0.3883 -0.3210 0.3431 -0.1869 -1.0210 -4.5 -10.8 -6.6 1.1 -3.2 -4.3 122 132 142 152 162 0.1520 0.2098 0.3857 -0.5257 0.1664 2.5 2.9 4.2 -1.7 1.6 0.1435 0.2089 0.3811 -0.5961 9.0933 2.4 2.9 4.1 -1.7 1.7 -0.2640 -4.6 -0.3876 -10.8 -0.3236 -6.6 0.2741 1.0 -0.2354 -3.2 -0.9461 -3.7 0.1520 0.2086 0.3875 -0.5280 0.1659 2.5 2.9 4.2 -1.7 1.6 -0.2328 -4.2 -0.3829 -10.7 -0.3215 -6.6 -0.1978 -4.3 -0.1522 -2.7 -1.1197 -4.8 0.1168 0.2112 0.3815 2.0 2.9 4.1 Table 4 : Parameter estimation results for single inflexion point model The parameter values for the inflexion points in tc_inveh, tc_access, and tc_egress, and tc_wait are all close together. The exception is the parameter tc_transf (transfer time), for which values of 5 and 14.7 were obtained in different runs. After reducing the number of inflexion points, model B gave values for the inflexion points that confirm those of Runs 1 and 3. The β parameters have the expected sign, and correspond closely across all models for the variables t_inveh, t_access, t_egress, and n_transf; these parameter values also have high t-ratios. The parameter estimates for t_wait © Association for European Transport 2001 11 show more spread, and have lower t-ratios; the estimates for t_transf are clearly influenced by the high value of tc_transf in run 2. Model B was obtained by eliminating the inflexion points for t_transf and t_wait from the model A. The result that the net weight coefficient for egress distance becomes positive for egress times greater than about 6.2 min is counter-intuitive. This counter-intuitive results was seen to reflect a feature of the data itself: for egress distances greater than about 450 m (10% of the total), the egress distance of the chosen alternative is consistently longer than of the non-chosen alternatives. No correlation between large egress distance and other explanatory variables could be observed. The number of data points for which this holds is about 10% of the total. Previous researchers have voiced an (unpublished) explanation for this phenomenon in the case of school-children: schoolchildren seem to disembark in groups a few stops before the school in order to walk to school together, which affects the weight of egress time. If separate models are estimated per travel purpose, in all cases the weight for egress time in the linear model is significantly lower than for access time. The large egress times in the chosen alternative are not offset by gains in the number of interchanges or total travel time. It mainly occurs in intra-city relationships; very few regional relation ships were involved; all observations were done in the am period, and most of them between 8 and 9. This leads us to suspect the presence of unobserved explanatory variables, this particular feature of the data does not seem to have been not noticed before, despite intensive study of the dataset. We note that the value of the egress-time coefficient for the models with linear utility function in Table 2 are about half of that of access time. It seems that the parameter estimate for egress time in the earlier models reflects the fact that the linear model will interpolate the shape of the utility function that is present in the data. We see in Table 4 that on the left of the inflexion point the weight of egresstime is closer to that of access time. Apparently the use of inflexion points has some value as a diagnostic tool, apart from any other considerations. The issue of the location of the inflexion point for transfer time tc_transf was investigated by plotting the likelihood as a function of the location of the inflexion points; the plots are shown in Figure 5. © Association for European Transport 2001 12 Figure 5 : Log-likelihood as a function of the location of the inflexion points The effects of access- and egress times (alpha 2 and alpha 3) are particularly well-defined; the effect of in-vehicle time (alpha 1) less so. The effect of access time (alpha 2) and transfer time (alpha 5) shows two minima: the first one around 7 and 5 minutes, the second one around 15 and 14 minutes with likelihood values that are close together. This explains why the MCS routine found two different values, and it suggests that two inflection points are needed in the transformation of t_transf. The likelihood seems indifferent w.r.t. the location of inflexion points in the waiting time (alpha 4), so that this inflexion point was eliminated from the model. Statistical significance The models with inflexion points contain 10 parameters more than the linear model if the location of the inflexion point is counted as a parameter, and 5 if it is not. Two times the log-likelihood difference between two models is asymptotically χ102 or χ 52 distributed; the corresponding critical points at 95% confidence level are: 18.307 and 11.0705, at 99% confidence level this is 23.2093 and 15.0863. The log-likelihood of the linear model (model 3) is –632.3, and the log-likelihoods of runs 1 and 2 of models A are -613.938 and -613.839 respectively. The loglikelihood difference of the models with and without inflexion points is 18.5; two time this difference is about 37, so that in either case we can reject the null hypothesis of no inflexion points at 95% and 99% confidence levels. © Association for European Transport 2001 13 The log-likelihood difference between model A and model B makes that the null hypothesis can only be rejected at the 6.3% confidence level, so that it is not possible to reject the null hypothesis that the inflexion points in t_wait and t_transf are zero. Resulting form of the utility functions Using the coefficient estimates shown in Table 4, the profiles of the utility contributions of access time and egress time were drawn up, and are shown in Figure 6. For access time, the inflexion point is at t = 7 min., and the slope of the utility function decreases after this point. This suggests that people weigh access time less heavily when it is greater than 7 minutes. Of course this may be a statistical phenomenon rather than a causal one, and could also signify that people who accept longer access times care less about them. Single-inflexion point piecewise linear model 0 0 1 2 3 4 5 -1 6 7 8 9 10 11 12 13 14 -2 V -3 -4 -5 Access time [min] Utility contribution of egress time Utility 9 10 .5 12 13 .5 15 16 .5 18 19 .5 21 22 .5 24 6 7. 5 -0.5 3 4. 5 0 1. 5 0 -1 -1.5 -2 -2.5 Egress time [min] Figure 6: Utility contribution of access time (left) and egress time (right) For egress time the inflexion point is at 6 min., and egress time weight is seen to decrease, which is not realistic. Comparison of the difference in egress time between the chosen and the non-chosen alternative (see in Figure 7 ) shows that there are only 5 data points with egress time greater than 12 min., which also © Association for European Transport 2001 14 differ considerably in their difference between egress time for the chosen and non-chosen alternatives (highlighted with an ellipse in Figure 7). These datapoints were removed from the dataset and the model was reestimated to test their influence on the results. The estimation results of the trimmed dataset are shown in Table 5. The parameter estimate for egress time to the right of the inflexion point has decreased from 0.382 to 0.299. In this case the piecewise linear model is sensitive to outliers. EGRDIF21 As can be seen from Figure 8, the contribution to the utility function is now more reasonable, except for purpose “school”, for which a tentative explanation was presented on page 15. 20 18 16 14 12 10 8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 0 2 4 6 8 10 12 14 16 18 20 EGRES1 Figure 7: Utility difference between egress time of chosen and non-chosen alternatives versus egress time of the alternative -LL n dof Standard -632.3 1095 6 Coef t-rat Alpha parameters tc_inveh tc_access tc_egress Beta parameters t_inveh t_access t_egress t_wait t_transf n_transf Piecewise linear -616.636 1095 9 Coef t-rat 9.9988 6.9463 6.2135 -0.1283 -9 -0.2999 -12.2 -0.1524 -5.2 -0.1977 -4.4 -0.1491 -2.7 -1.056 -4.6 t_inveh_2 t_access_2 t_egress_2 -0.2328 -4.2 -0.3829 -10.7 -0.3215 -6.6 -0.1978 -4.3 -0.1522 -2.7 -1.1197 -4.8 0.1168 0.2112 0.3815 2.0 2.9 4.1 Piecewise linear (trimmed) -616.636 1095 9 Coef t-rat 9.9988 6.9463 6.2135 -0.2347 -0.3854 -0.3067 -0.2016 -0.1507 -1.1250 -4.2 -10.7 -6.1 -4.4 -2.7 -4.9 0.1191 0.2167 0.2994 2.0 3.0 2.9 Table 5 : Estimation results after removal of outlying datapoints © Association for European Transport 2001 15 Utility contribution of egress time (after removing 5 highest egress time obs. out of 1095) Weighted egress time 24 22 20 18 16 14 12 10 8 6 4 2 0 0 -0.5 School -1 Shop -1.5 Work -2 -2.5 Egress time [min.] Figure 8 : Contribution of egress time after trimming Two inflexion points per explanatory variable The results from the extension of the Van der Waard models with a single inflexion point per explanatory variable suggest (see Figure 5, right-hand plot) that a model specification with two inflexion points can be considered. This was checked by estimating the following model V = β a ta + β wt w + β i ti + β ete + βtr ttr + β wlktrt wlktr + β ntr ntr + β i ,1 dim(ti ,αi,1 ) + β a,1 dim(t a ,α a,1 ) + β e dim(te,1 ,α e,1 ) + β w,1 dim(t w ,α w,1 ) + βtr,1 dim(tttr ,αtr,1 ) (16) + β i ,2 dim(ti ,αi ,2 ) + β a,1 dim(ta ,α a,2 ) + β e dim(te,2 ,α e,2 ) + β w,2 dim(tw ,α w,2 ) + β tr,2 dim(tttr ,αtr,2 ) The location of the second inflexion point was in the top 99 percentile range of the explanatory variables, and is therefore quite sensitive to outlying observations. Clearly there is only room for a single inflexion point. For this reason the model specified in (16) was not investigated further. The Box-Cox transformation As with the piecewise linear function, model (3) from (14) has been selected as basis for extension. For the following variables t a , te , ti , t w the Box-Cox transform was calculated: t a → t aλa , te → t eλe , ti → tiλi , t w → t wλw . The following models were estimated: V = β a t a + β wt w + β i tiλi + β ete + β tr ttr + β wlktrt wlktr + β ntr ntr (17) V = β a t aλa + β wt w + β i tiλi + β eteλe + β tr ttr + β wlktrt wlktr + β ntr ntr (18) In theory the advantage of this approach should be to show whether or not the non-linearity adds to the explanatory value of the model. © Association for European Transport 2001 16 The estimation of the model shown in (17) gave a λ = 0.5, which is broadly in agreement with the results from the piecewise linear model. However the estimation of the model shown in (18) resulted in λ = 2.5 for invehicle time and λ = 1.5 for access- and egress time. Unfortunately these results lack credibility for the following reasons: • the λ value of 2.5 for in-vehicle time, and 1.5 for access- and egress time suggest an increase in marginal disutility of a minute of access- egress- or invehicle time where model (17), the piecewise linear model and common sense suggest a decrease in marginal disutility • values of λ > 1 imply that the error term ε n in (1) should decrease with increasing in-vehicle time, which is not visible in plots of the utility difference between chosen and non-chosen alternatives (according to the Van der Waard model 2) as a function of trip length, total trip length, or disutility of the chosen alternative as shown in Figure 9. 6 4 2 0 UDIFW3 -2 -4 -6 -14 -12 -10 -8 -6 -4 -2 0 U1W2 Figure 9: Disutility difference (chosen - non-chosen) by disutility of chosen alternative Unfortunately the estimation results for λ were found to be unstable between model specifications, which suggests that the Box-Cox transformation should be used with caution. © Association for European Transport 2001 17 3 CONCLUSIONS The addition of non-linearities “costs” extra parameters; parameters were determined using Maximum Likelihood estimation. Instead of writing software from scratch to estimate all parameters together, the ML problem was split into ordinary MNL estimation (Problem B) and determination of the parameters that determine the non-linearity (Problem A). It was found that standard local minimisation routines that rely on derivative information would not work satisfactorily on Problem A, and that Problem A has multiple local minima. Our conclusion is that Problem A has to be solved using global minimisation routines. Problem A was solved using grid-search, random search, RS-LOPT, and the MCS global optimisation software. Of the local optimisation methods tried in RSLOPT only the derivative-free Simplex and MCS methods converged, the Simplex method very slowly. The MCS routine found the same optima as gridsearch and RS-LOPT, but quicker by a factor of 60. Although we are reluctant to put trust in global minimisation routines (but recommend at least some checks based on grid-search or RS), estimating the location of inflexion points using the MCS optimisation routine as described appears feasible. The extra logit parameters that define inflexion points are statistically different from zero, and lead to improved model fit. The results are behaviourally credible in the sense that the disutility of a small increase in access- or egress time varies with the total amount of access/egress time already experienced. A counterintuitive result w.r.t. egress time was found, but turns out to be a feature of the data and could be due to unobserved variables. The dataset under consideration shows evidence of non-linearity in the valuation of access time, egress time, and in-vehicle time for route-choice in PT networks. Models with two inflexion points could not be estimated on this dataset. Models using the Box-Cox transformation were also estimated, but gave unstable or counter-intuitive results. We note that the current dataset, being based on urban public transport trips in radial corridors in densely populated areas of The Netherlands, contains only short trips with short egress and access times. This dataset is not representative for interurban (regional) transport, so that the findings of non-linearities in the utility function cannot be generalised. 4 REFERENCES Amemiya, T. (1985) Advanced econometrics. Harvard University Press. Ben-Akiva, M., Lerman, S.R. (1989) Discrete choice analysis. Theory and application to travel demand. MIT press. Blajac, T, Causse, A. (1998) Valeur du temps de transport: l’apport de la modelisation micro-economique du choic modal. Research report No. DT 98-01, Laboratoire Montpellieran d’Economique theorique et appliquee, University of Montpellier, France. © Association for European Transport 2001 18 Blajac, T, Causse, A. (2001) Value of travel time: a theoretical legitimization of some nonlinear representative utility in discrete choice models. Transp. Res. B 35 (2001) pp. 391-400 CTPS (1997) Transfer penalties in urban mode choice modelling. Report prepared for the US Department of Transportation, Federal Transit Administration, Federal Highway administration, Office of the Secretary, US Environmental protection Agency under the Travel model improvement program (TMIP). Eliasson (2000) Transport and location analysis. Ph.D. thesis. Royal Institute of Technology, dept. Of Infrastructure and planning, Transport and location analysis division, Stockholm, Sweden. Report TRITA-IP FR 00-79. Gaudry, M.J.I., Wills, M. (1978) Estimating the functional form of travel demand models. Transpn. Res. Vol. 12, pp. 257-289. HCG (1995) Alogit User’s guide. Hague Consulting Group, The Netherlands. Huyer, W., Neumaier, A. (1999) Global optimisation by multilevel coordinate search. SIAM Journal of Global Optimisation 14 (1999), 331-355. Lotan, T., Koutsopoulos, H.N. (1993) Models for route choice behaviour in the presence of information using concepts from fuzzy set theory and approximate reasoning. Transportation 20: 129-155, 1993. Mandel, B., Gaudry, M., Rothengatter, W. (1993) A disaggregate Box-Cox logit mode choice model of intercity passenger travel in Germany. Departement de sciences economiques; cahier 9307, ISSN 0709-9231. Pinter,J.D. (1996) Global optimisation in action, Kluwer, Dordrecht. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. (1995) Numerical recipes in C. (Second edition). Cambridge University Press. Van der Waard, J. (1988) Onderzoek weging tijdelementen. Deelrapport 2, Dataverzameling en verwerking (In Dutch). Delft University of Technology, faculty of Civil Engineering, Transportation Planning section. Report no. VK 5302.302. © Association for European Transport 2001 19
© Copyright 2025 Paperzz