Local Transport Today is the authoritative, independent journal for transport decision makers. Analysis, Comment & News on Transport Policy, Planning, Finance and Delivery since 1989.

Big data for transport: processing and fusion

The modelling world is excited at the prospect of using novel data sources. But despite their potential advantages, there has not been a massive shift to use them in practical applications. When this has been tried, users report mixed success. Luis Willumsen and Miguel Picornell, Kineo Mobility Analytics, discuss the processing and fusion of mobile phone data to address a transport issue in Spain

Luis Willumsen and Miguel Picornell
Kineo is currently involved in a study to improve the performance of an existing toll road in Spain involving the processing and fusion of mobile phone data
Kineo is currently involved in a study to improve the performance of an existing toll road in Spain involving the processing and fusion of mobile phone data
The different processes required when using mobile phone data to achieve value
The different processes required when using mobile phone data to achieve value
The figure above shows one example of these estimates at different times of an average day and contrasts them with the observed flow profile
The figure above shows one example of these estimates at different times of an average day and contrasts them with the observed flow profile


There are many new sensors that can be used to collect travel data or produce digital traces that in turn can be processed to deliver this type of information. CCTV, smart cards, bank cards, Global Positioning Systems (GPS), mobile phones, Bluetooth devices and WiFi all leave digital footprints that offer the opportunity of collecting travel data at low cost.

Conventional methods such as Household Travel (HTS) and Intercept Surveys are not error-free, and the observations are often disappointedly limited and sparse. However, we know how to cope with these issues and can 'muddle through'. We have not yet developed an appropriate adaptation of new data to our specific needs, nor established how to make good use of these new sources. We must ask: are these data sources equally usable? What are their limitations? Can we use them directly in our models? Can we develop better models and solve other problems we were unable to address in the past?

Big data sources

It is useful to classify the new data sources into three groups: 

  • Purposeful methods. These require taking some action to locate sensors and collect the data. Examples of this are number plate matching using CCTV or the matching of Bluetooth MACs. 

  • Opportunistic methods. These involve using data that is already generated for other purposes but can be processed to obtain the desired mobility indicators. Anonymised mobile phone and smartcard data are examples of these.

  • Crowd-sourced methods. This involves the use of data collected from different sources volunteering their use, for example Wade.

Another way of classifying methods is to look at whether they provide true Origin to Destination data (as with HTS or intercept surveys) or only partial information as with number plate or Bluetooth matching. All these data sources must be processed in different ways before they can add value to our practice. 

Extracting and cleaning useful information from these data sources has to deal with elimination of errors, storage issues, timeliness and protection of privacy and anonymity. Not all data provides enough information and some must be rejected, resulting in a smaller sample than originally envisaged. For example, some mobile phone data reflects no complete movements during a day for a number of reasons, including the lack of travel. Algorithms must be designed to filter these in an efficient manner.

Analysis and modelling is then required to adapt the data to the needs of the modeller. For example, mobile phone data is initially geocoded to cells, but will be used in traffic zones, so a conversion process is needed. The same applies to identification of zones for home, place of work/study and other activities to produce journey purposes.  

Finally, the data only adds value in an application like a transport model and this requires interpretation. Visualisation tools are very important to obtain the best value from these processes.

Using mobile phone data

One must consider two types of resolution in the use of mobile phone data: geographic and time. In principle, the minimum geographic resolution is provided by the location of the antennas and the construction of Voronoi polygons around them. The actual cells are more detailed than this, as most antennas are directional; finally, if sufficient data is retained, it is possible to pinpoint a location by triangulation from signals from more than one antenna. Each activity on a mobile phone generates a data point in time associated to a location. Call Detail Records (CDRs) are collected to charge users and therefore each call, SMS or data transmission generates such a point. In order to improve operations, some mobile phone companies are installing probes that check the location of each phone with some regularity, thus providing greater time granularity; probe data is often discarded a short time after use so specific agreements must be reached to preserve it for further use.

Anonymised data from several days is used to identify regular travel patterns and therefore homes, places of work or study. Once a home zone is identified, it is possible to use census data to expand the sample and gain some idea of the socio-economic group of travellers. It is also necessary to recognise that the movements detected are not of people but of mobile phones, or more specifically SIMs that may be on a phone, tablet or other devices.

It must be noted that the cell structure is different from the zoning system employed by our models, and there is no one-to-one relationship between them. Moreover, mobile phone data can be used to detect movements and infer trip matrices, but some people movements do not fit into Origin-Destination matrices: people wander, get lost, make several successive stops and so on. Some movements (never detected previously at roadside interviews) are exclusively on minor roads not all included in our models: transferring them to the model network creates unusual routings not reflected in traffic counts.

A transport model expects trip matrices defined for a particular time period, and intercept surveys are geared to provide this. Mobile phone movements record the time at each location. The conversion of these movements to timed trip matrices is not trivial and requires understanding the transport model and the travel times between points without observations.

In general, to identify the mode of travel and deal with vehicles rather than people, at Kineo we have found it necessary to employ traffic counts and other complementary data sources; these depend mostly on the application in hand. In the same way, fusion with smartcard data helps in identifying public transport users and movements. 

An application

As an example of such processing and fusion of mobile phone data to address a specific transport problem, Kineo is currently involved in a study to improve the performance of an existing toll road in Spain. The main objective was to obtain Origin-Destination matrices for different times of the day (and different days) and use them to calibrate and run a network model testing alternative optimisation strategies. Therefore, the work undertaken goes beyond the production of trip matrices and becomes involved in their use. This work was implemented in close collaboration with the final client.

The data used were anonymised CDRs from Orange. We also used network and travel times between points from other sources. Additionally we constructed three screenlines with traffic counts available, again from different sources. 

Although the transport model used less than 200 zones, our mobile phone movements model dealt with some 15,000 'zones'; also our network model was more detailed than the simplified transport network, containing all roads. We were able to process our expanded mobile phone data and estimate, with assumptions about vehicle occupancy, an upper and lower range for the flows crossing each of the three screenlines. 

Our high and low estimates mostly (not always) straddle the observations. Of course, now it is possible to use matrix estimation techniques and additional classified counts to achieve two improvements:  identification of vehicle types and refinement of the trip matrices using different values of time. The latter is helped by the fact that in most cases we were able to identify the route and therefore which vehicle pays the toll and which one does not.

Future opportunities

Mobile phone data on its own is not enough to deliver fully actionable trip matrices. However, by integrating and blending mobile phone data with other data sources, it is possible to deliver reasonably accurate trip matrices that only require refinement to be achieved in the final transport model during calibration. The trip matrices obtained from the fusion of data sources require, in our view, fewer adjustments (for example to fill empty cells) than those produced by conventional methods.

Current transport models operate at a higher degree of aggregation than the anonymised data available from mobile phone operations. This is partly because of the limitations of current data collection methods, and partly because a greater regularity of trips is expected at that level. Mobile phone data can provide greater geographic and temporal disaggregation, opening new opportunities with respect to current data sources, such as building models for specific days and times and the analysis of recurrent and non-recurrent trips and their variability.

Moreover, mobile phone and smartcard data give us, for the first time, the opportunity to learn from spontaneous experiments; how do people adjust over time to temporary disruptions like an underground service interrupted for a month, or sudden flooding of a main route. The dynamics of change would be visible and researchable.

Achieving these benefits requires not only a good understanding of mobile phone data and how to filter out errors, identify activities and trips, and reliably expand the sample, but also knowledge of how the data can be enriched and used in the final model. The authors do not believe that the field is mature enough yet to deliver standard processes and matrices independently from the application that will use them. For the time being, close collaboration with the end client is highly beneficial and perhaps essential.

Luis Willumsen and Miguel Picornell, Kineo Mobility Analytics

Discuss this at Modelling World 2016

TransportXtra is part of Landor LINKS

© 2020 TransportXtra | Landor LINKS Ltd | All Rights Reserved

Subscriptions, Magazines & Online Access Enquires
[Frequently Asked Questions]
Email: subs.ltt@landor.co.uk | Tel: +44 (0) 20 7091 7959

Shop & Accounts Enquires
Email: accounts@landor.co.uk | Tel: +44 (0) 20 7091 7855

Advertising Sales & Recruitment Enquires
Email: daniel@landor.co.uk | Tel: +44 (0) 20 7091 7861

Events & Conference Enquires
Email: conferences@landor.co.uk | Tel: +44 (0) 20 7091 7865

Press Releases & Editorial Enquires
Email: info@transportxtra.com | Tel: +44 (0) 20 7091 7875

Privacy Policy | Terms and Conditions | Advertise

Web design sussex by Brainiac Media 2020